Why Most Geologists Quit Python in the First Hour
Closing the gap between geoscience and computer science is the work I care most about, and the coding piece is where most geologists get stuck. Tutorial hell is real. This series is three short videos to take a working geoscientist from never having coded to writing useful Python.
It is worth being honest about why this happens. The official Python install is fine, but the surrounding ecosystem (Anaconda, virtual environments, package managers, paths, terminals) is genuinely hostile to a beginner. None of that has anything to do with Python the language. By the time you have wrestled with a PATH variable, you have lost the thread of what you were trying to learn. Most people quit there. The ones who push through then run into the second wall: example datasets that have nothing to do with their work. Iris flowers, Titanic passengers, housing prices in Boston. Nothing that connects to the actual problems a working geologist faces.
The thing that frustrates me about this is that the geological intuition required to do meaningful data work is the hard part, and most working geologists already have it. Reading a drillhole log, knowing what an alteration assemblage means, sensing when a soil geochem anomaly is real or noise: that is the deep skill. The coding is the wrapper. We have a generation of practitioners who have the hard skill but never get past the wrapper, and that is a waste.
Closing the gap between geoscience and computer science is the work I care most about, and the entry point matters. If the foundation is shaky, everything built on top of it is too.
The series
I have been working on a short video series for geoscientists who want to pick up Python. The audience I had in mind is Frank Arnott Award participants, but it works for anyone newer to coding who wants to do more than admire other people's notebooks from a distance. Three videos, each about thirty minutes, building from "Python is open in your browser" to "I can do useful things with a real geoscience dataset."
The first video is out now.
What's in Video 1
The whole point of the first video is to get past the installation problem. We use Google Colab, which is a Python environment that runs in your browser, hosted by Google, free. No install, no terminal, no PATH variables. You open a tab and you are coding.
By the end of thirty minutes, you have written your first piece of Python, deliberately broken it to read an error message, and loaded a real drillhole assay dataset. The last fifteen minutes are spent doing things you would actually do on a project: filtering rows, computing summary statistics, looking at grade by hole. It is not a toy example. The CSV is small but it tells a story (one mineralized hole, one barren, one with a Cu-Au signature), and that story is visible in the numbers once you know how to ask.
I also spend some time on errors. Beginners are afraid of error messages because they look like a wall of red text written for someone else. They are not. They are short, specific, and almost always tell you what to fix. Demystifying that fear in the first thirty minutes is, I think, the highest-leverage thing in the whole video. It is the difference between a viewer who keeps going and a viewer who closes the tab.
The dataset and the notebook
Everything in the video is in a public GitHub repository, including the example dataset and a completed Jupyter notebook you can open directly in Colab.
github.com/DHeasmanGDS/geodatascience-python-intro
The notebook follows the video step by step, plus a few bonus cells at the end that preview Video 2 and Video 3 (filtering rows by drillhole, group statistics by lithology, a simple downhole grade plot). If you finish the video and want to keep going, those cells are there for you to poke at.
What's coming
Two more videos are planned for this series.
Video 2 goes deeper into Python itself: variables, lists, dictionaries, loops, functions. All with geoscience examples (drillhole intervals, sample IDs, lithology codes) rather than the generic foo and bar that haunts most introductory tutorials. The goal is to get someone who finished Video 1 to the point where they can read most Python code and have a reasonable guess at what it does.
Video 3 is the payoff. We take a real geoscience workflow end to end with pandas and matplotlib: load a dataset, clean it, group it, summarize it, and produce figures you would actually put in a report. By the end of three videos, the gap between "I have never coded" and "I can write a useful script" is closed.
After the series is done, future content on the blog and the channel will assume this baseline. So if you have ever read one of my posts on Bayesian methods or graph neural networks and wished there was a more accessible starting point, this is it. Start here, then come back to the harder posts later.
If you are a Frank Arnott participant
This series exists in part because of the Frank Arnott competition. My team and I won in 2022, and what stuck with me afterwards was how often the bottleneck for good ideas was not geological insight, it was the coding piece. People had instincts they could not execute. If you are heading into Frank Arnott and you want a clean foundation before the competition kicks off, the three videos plus the GitHub repo are designed to get you there.
If you are not, the series still works. The geology examples are mineral exploration flavoured because that is what I know best, but the Python is general purpose. Hydrogeologists, structural geologists, environmental scientists, anyone working with tabular field data will find the same tools useful.
The full series will be posted as it comes out. The repo will grow alongside the videos. Feedback and corrections are welcome via GitHub issues or directly through the blog.