The Mistake Most People Make
When people first approach prediction modeling, they tend to use raw numbers directly. "The odds are 2.50, so I'll just plug 2.50 into my model."
This is like feeding a recipe to someone who doesn't know what flour is. The model has no context. It doesn't understand that 2.50 means roughly 40% probability, or that the same probability looked like 45% two hours ago.
Our entire feature engineering philosophy is built around one principle: give the model context, not just numbers.
What We Actually Build
Every match that flows through our system goes through eight transformation stages. Let me walk you through them like I would explain to someone joining our team.
Stage 1: Format Standardization
We receive data in decimal, fractional, and American formats. All of it gets converted to decimal first. Why? Because decimal is the cleanest for math—multiply by stake, get total return. Simple.
Stage 2: Probability Conversion
Decimal odds become implied probabilities. The formula is simple: divide 1 by the odds to get probability. A 2.50 odd becomes 0.40, or 40%.
But here's the catch: if you add up probabilities across a market, you get more than 100%. That extra bit is the margin—the house edge.
Stage 3: Margin Removal (De-vigging)
We strip out that margin to get "fair" probabilities. Now the numbers represent actual implied chances, not distorted ones.
This step is critical. Without it, you're training on biased data. A team that's really 45% might show as 42% in raw numbers because of how margin is distributed.
Stage 4: Timestamp Alignment
We store snapshots at consistent intervals: opening, mid-day, and closing. This lets us track how probabilities evolve over time.
Without proper timestamps, you can't build movement features. And movement features are some of the most predictive signals we have.
Stage 5: Movement Features
Now the interesting part. We calculate:
- Delta: How much probability changed from open to now
- Velocity: Rate of change per hour
- Volatility: How choppy the path was
- Late intensity: How much of the movement happened in the final hours
Each of these becomes a column in our feature table.
Stage 6: Consensus Metrics
We aggregate across multiple data sources:
- Median probability: Central tendency across providers
- Dispersion: How spread out the opinions are
- Outlier flags: Is one source wildly different?
High dispersion often means uncertainty. Low dispersion means agreement. Both are informative.
Stage 7: Cross-Market Validation
Different market types (1X2, Asian Handicap, Over/Under) should tell consistent stories. If 1X2 says the home team is favored, but the handicap suggests otherwise, something's off.
We flag these inconsistencies. Sometimes they're arbitrage opportunities being corrected. Sometimes they're data errors. Either way, the model should know.
Stage 8: Evaluation Metrics
Finally, we add signals that help evaluate our own predictions:
- Brier score components
- Calibration buckets
- Baseline comparison metrics
This closes the loop. We're not just predicting—we're measuring how well our predictions performed.
Why Not Just Use Raw Data?
I get asked this a lot. Here's the simple answer: raw data is noisy and inconsistent.
Different sources report at different times. Margins vary by provider. Formats differ by region. If you feed all that directly into a model, you're training on chaos.
Feature engineering is about creating a common language. Every match gets described the same way, regardless of where the data came from. That consistency is what lets the model learn patterns.
A Practical Example
Let's say we're looking at a Premier League match. Here's what the raw data might look like from one source:
- Home win: 1.85 (opens), 1.80 (closes)
- Draw: 3.60
- Away win: 4.50
And here's what our pipeline produces:
| Feature | Value |
| home_fair_prob | 0.52 |
| draw_fair_prob | 0.26 |
| away_fair_prob | 0.22 |
| home_delta | +0.02 |
| home_velocity | 0.003/hr |
| volatility | 0.008 |
| late_intensity | 0.65 |
| dispersion | 0.015 |
| cross_market_align | 0.94 |
Key Takeaways
- 1Raw data is messy; features are structured
- 2Probability conversion and de-vigging create a fair baseline
- 3Movement and consensus add temporal and cross-source context
- 4Cross-market checks catch inconsistencies
- 5Good features make models smarter
📖 Related reading: Opening vs Closing • Market Consensus • Movement Analysis
*OddsFlow provides AI-powered sports analysis for educational and informational purposes.*

