// geoscience + machine learning

Feature Engineering for Geoscientists: Turning Intuition into Data

June 20, 2025

Feature engineering is where geological intuition becomes something a model can actually use. In mineral exploration, the challenge is not just building a model, but deciding what geological signals deserve to become inputs in the first place.

Feature Engineering Comic — turning rocks into data.

When people first start applying machine learning to exploration datasets, there is a tendency to focus on the algorithm. Random forest, gradient boosting, neural networks, clustering. Those choices matter, but they are rarely the first place where a project succeeds or fails.

The bigger question is more geological than computational: what should the model actually be looking at? That is the job of feature engineering. It is the process of translating expert reasoning into variables that can be measured, mapped, and passed into an analysis pipeline.

Geologists are already doing this mentally. We look at structure, alteration, lithology, and context, then build an interpretation of why mineralization might occur where it does. Feature engineering is the process of expressing that reasoning in a form a model can actually use.

Know your data scientifically

Before engineering features, it helps to step back and ask what each dataset is actually measuring. Not just what the file is called, but what kind of signal it carries and how trustworthy that signal is.

What is the spatial resolution?
What are the sampling biases?
Is it a direct or indirect measure of the process you care about?
Does its scale support the kind of prediction you want to make?

Those are basic scientific design questions, but they are easy to skip when datasets are already sitting in a project folder. Good feature engineering begins by respecting what the data can and cannot say.

Concept → proxy → dataset

At its core, feature engineering starts with a concept — a geological process or control, such as enhanced permeability, a magmatic heat source, or redox architecture.

Next comes the proxy: the measurable stand-in that captures part of that concept numerically. That might be distance to a fault, proximity to an intrusion, magnetic texture, conductivity contrast, or a geochemical ratio.

Finally, the proxy has to come from a dataset. That may be a fault interpretation, a mapped intrusion outline, a gridded survey, a geochemical compilation, or a processed remote sensing product.

Sometimes the process starts with geology and works toward the data. Other times a newly available dataset suggests a useful proxy you had not considered before. The strongest workflows usually move in both directions.

Concept to Proxy to Dataset Diagram — A proxy turns a concept into measurable data, grounded in a dataset.

The Mineral Systems Approach is a strong guide

For mineral prospectivity work, the Mineral Systems Approach gives you a natural framework for feature engineering. If you are asking where mineralization may occur, you can start by asking what ingredients the system requires: source, pathway, trap, depositional site, alteration footprint, preservation, and so on.

Each of those concepts can suggest candidate features. Structures may become distance rasters or density measures. Favourable host rocks may become categorical layers. Geophysics might serve as a proxy for alteration, lithology, or basement architecture. Geochemistry can capture dispersion patterns or pathfinder associations.

Mineral Systems Approach diagram — Ford et al., 2019 — translating mineral systems into mappable proxies.

The Mineral Systems Approach helps keep feature engineering grounded. It gives you a logic for why a variable belongs in the model, rather than treating feature selection as a purely statistical exercise.

Resolution matters

Regression vs Classification Maps — Choose regression when the data supports continuous variation; choose classification when the information is broader or categorical.

A key decision in geoscience ML is whether to use regression or classification, and the answer often depends on data resolution.

Regression predicts continuous values such as grade, depth, thickness, or concentration.
Classification predicts categories such as prospectivity class, lithological group, or mineralization presence.

Dense geophysics and well-sampled geochemistry can support relatively detailed spatial predictions. Broad geology maps or generalized categorical interpretations may be more useful for classification or ranking workflows. The correct modelling choice often starts with the meaning and scale of the data rather than the sophistication of the algorithm.

A practical checklist

Start from a geological concept, not from the software.
Translate intuition into a measurable proxy.
Use the Mineral Systems Approach to keep variables defensible.
Check what each dataset is actually measuring.
Match the modelling approach to the resolution of the data.

Feature engineering is where geoscience meets machine learning. It is the point where intuition becomes structure, and structure becomes something a model can test.

The strongest exploration models usually do not begin with clever code. They begin with a geologist asking the right question and expressing it in a form the machine can use.

In future posts, I will break down some of the most useful geoscience features in more detail and show how they can be generated in practice.