MatchOracle

Deep Ensemble EPL Prediction Engine

13 base learners · Dixon-Coles statistical model · 376+ features · 8 data sources · NLP sentiment analysis

An advanced EPL match prediction system built on real data from 8 sources spanning 20 seasons (~7,600 matches). Features a 5-layer deep stacking ensemble with walk-forward backtesting that consistently outperforms market odds.

Metric	Result	vs Market
3-Way Accuracy	60.2%	+4.6% (55.6%)
RPS Score	0.1163	+11.2% skill
ELITE Tier (>70%)	82.2% acc	90 matches
Best Season	66.8%	2023-24

Ensemble Architecture · Features · Data Sources · Results · Getting Started

Highlights

5-Layer Ensemble Architecture

Layer 0: Dixon-Coles statistical model
Layer 1: 13 base learners (HGB, XGB, LightGBM, CatBoost, RF, MLP)
Layer 2: 4 meta-learners with isotonic calibration
Layer 2.5: Binary classifier boosting
Layer 3: Best ensemble selection (stacking vs weighted avg vs market-fused)

376+ Engineered Features

Elo + Glicko-2 + Pi-Ratings
Rolling form (6 windows), H2H, momentum
Market intelligence (Shin probabilities, odds movement)
Poisson goal decomposition, xG-based metrics
Manager tenure, GK quality, rest days, scoring patterns

NLP Sentiment Analysis

RoBERTa transformer (70%) + keyword fallback (30%)
Dual news sources: NewsAPI + Google News RSS
Aspect-based: injury risk, morale, tactical disruption
30 live sentiment features injected at prediction time

Smart Model Caching

5 automated retraining checks (data hash, age, integrity)
<2 second predictions after initial training
Interactive CLI with arrow-key fixture selector
Auto-generated HTML dashboard with Plotly/Chart.js

Ensemble Architecture

Layer 0: Dixon-Coles (goals + xG variants)
    │
Layer 1: 13 Base Learners
    │   HGB, HGB-Agg, HGB-Deep, XGBoost, LightGBM, CatBoost,
    │   Random Forest, Extra Trees, DeepMLP, MLP-Wide,
    │   Logistic Regression, Bagging-HGB, Vote-HGB3
    │
Layer 2: 4 Meta-Learners (Meta-LR, Meta-MLP, Meta-HGB, Meta-XGB)
    │
Layer 2.5: Binary Classifier Boosting (4 dedicated HGB models)
    │
Layer 3: Best Ensemble Selection
        Stacking vs Weighted Avg vs Binary-3Way vs Market-Fused

Data Sources

#	Source	Coverage	Key Data
1	football-data.co.uk	20 seasons	Results, shots, corners, bookmaker odds
2	Understat	11 seasons (2014+)	Match-level xG, xGA
3	Club Elo	20 seasons	Historical Elo ratings
4	Open-Meteo	20 seasons	Weather at stadium GPS
5	Football-Data.org	Live season	Standings, fixtures, H2H
6	API-Football	Live season	Injuries, player ratings
7	NewsAPI	Live	Team news for NLP sentiment
8	Google News RSS	Live	Fallback news source

Walk-Forward Results

5-season walk-forward backtesting (no future data leakage):

Season	Accuracy	RPS	vs Market
2020-21	55.0%	0.1314	—
2021-22	61.8%	0.1080	+17.6%
2022-23	60.0%	0.1189	+9.2%
2023-24	66.8%	0.1040	+20.6%
2024-25	57.1%	0.1194	+8.9%

Confidence Tier Breakdown

Tier	Confidence	Accuracy	Matches
ELITE	>70%	82.2%	90
VERY HIGH	60-70%	66.7%	96
HIGH	50-60%	62.7%	83

376+ Feature Groups

Group	Features	Description
Elo Ratings	6	Home/away Elo + differential
Pi-Ratings	8	Home/away attack/defense ratings
Rolling Form	40+	6 rolling windows (3-20 matches)
Head-to-Head	20+	Historical H2H results and trends
Market Intelligence	15+	Implied probabilities, odds movement, Shin
Momentum	25+	Streaks, acceleration, velocity
Glicko-2	10	Rating + uncertainty + volatility
Poisson	15+	Attack/defense decomposition
Contextual	10	Derby flags, distance, title contender
Injuries	12	Per-team injury impact
Manager	6	Tenure, new manager bounce
GK Quality	6	Clean sheets, consistency
Sequence Patterns	12	Encoded result sequences
Scoring Patterns	8	Early goals, comebacks
Rest Days	6	Fatigue/freshness flags

Getting Started

git clone https://github.com/abailey81/MatchOracle.git
cd MatchOracle

# Setup
chmod +x setup.sh && ./setup.sh
# Or manually:
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Configure API keys
cp .env.example .env  # Edit with your keys

# Run predictions
python predict.py --fdo-key YOUR_KEY --apif-key YOUR_KEY --news-key YOUR_KEY

Project Structure

MatchOracle/
├── predict.py                  # Interactive CLI entry point
├── dashboard.py                # HTML dashboard generator
├── data/
│   ├── generator.py            # Real data pipeline (8 sources, 20 seasons)
│   └── api_client.py           # Rate limiter, circuit breaker, caching
├── features/
│   ├── engine.py               # 376+ features across 24 groups
│   └── sentiment.py            # RoBERTa NLP sentiment analysis
├── models/
│   ├── run_pipeline.py         # 5-layer ensemble pipeline
│   ├── dixon_coles.py          # Statistical model (1997)
│   └── model_cache.py          # Smart caching with retraining detection
├── requirements.txt
├── setup.sh
└── .env.example

MIT License

Built with scikit-learn, XGBoost, LightGBM, CatBoost, and RoBERTa

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
data		data
features		features
models		models
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
dashboard.py		dashboard.py
predict.py		predict.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatchOracle

Deep Ensemble EPL Prediction Engine

Highlights

Ensemble Architecture

Data Sources

Walk-Forward Results

Getting Started

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MatchOracle

Deep Ensemble EPL Prediction Engine

Highlights

Ensemble Architecture

Data Sources

Walk-Forward Results

Getting Started

Project Structure

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages