Feature Engineering for Geoscientists: Turning Intuition into Data

Feature Engineering Comic – turning rocks into data
Feature engineering is one of the most important — and often overlooked — steps in any machine learning (ML) pipeline. It involves transforming raw data into meaningful features that help a model learn patterns and make predictions.
While it’s tempting to obsess over choosing the “right” model — Random Forest vs. Neural Networks, for instance — it’s usually the quality of your features that drives performance. This is also where geological knowledge has the biggest impact: translating concepts we intuitively grasp — like structural controls or fluid pathways — into numerical inputs a machine can actually learn from.
Geologists are trained to “see” patterns in maps — this is built from experience and spatial reasoning. But ML models can’t see — they need numbers. Our intuition must be encoded as structured data.
Know Your Data — Scientifically
Don’t just rely on “data availability.” Understand what each dataset actually measures:
- What is the spatial resolution?
- What are the sampling biases?
- Is it a direct or indirect measure of your target variable?
These are basic research design questions — but they’re often overlooked in geoscience ML workflows. Apply the same scientific scrutiny to feature design as you would to field sampling.
Concept → Proxy → Dataset
At its core, feature engineering starts with a concept — a geological process or control, like “enhanced permeability” or “magmatic heat source.”
Next, you choose a proxy to represent that concept numerically — for example, distance to fault or proximity to intrusion.
Finally, that proxy comes from a dataset — such as fault line shapefiles or gridded geophysical surveys.
While inspiration often starts with a concept and works toward the data, the reverse is also true. A dataset might spark an idea: a new gravity survey or alteration product might serve as a proxy for a geologic process. This makes feature engineering a creative, two-way dialogue between geology and data.

A proxy turns a concept into measurable data, grounded in a dataset.
Use the Mineral Systems Framework
The Mineral Systems Approach provides a structured way to generate features:
For each key process (e.g., energy, fluids, structure, traps, preservation), ask:
- What concept matters?
- What proxy can represent that concept?
- What dataset can provide that proxy?

Ford et al., 2019 – Translating mineral systems into mappable proxies (Ore Geology Reviews, 111, 102943).

Choose regression when the resolution supports it; classification when it doesn’t.
Resolution Matters
A key decision in geoscience ML is whether to use regression or classification — and data resolution helps guide that choice:
- Regression predicts continuous values — like grade or depth.
- Classification predicts categories — like deposit types or prospectivity zones.
When your data has high spatial resolution (e.g., geophysics, dense geochemistry), you can model spatial variation more precisely — making regression models ideal.
But low-resolution or categorical data (e.g., lithological units or coarse geology maps) may be better suited for classification. These datasets don't support fine-grained predictions — but can still group or label based on broader patterns.
Always ask: What is this dataset really measuring? What is its spatial granularity?
Summary: Feature Engineering Checklist ✅
- ✅ Ground every feature in a geoscientific concept
- ✅ Translate intuition into measurable proxies
- ✅ Use the Mineral Systems Approach as a guide for mineral prospectivity
- ✅ Understand your data and what it measures
- ✅ Choose regression or classification based on data granularity
Feature engineering is where geoscience meets machine learning — the point where intuition becomes insight, and insight becomes impact.
Stay tuned for future posts where I’ll dive into specific geoscience features and how most of these are generated.
Got thoughts on feature engineering? Questions about specific proxy ideas? Drop a comment or reach out!