§ V · Long-form
10 min readlast revised 2026-04-22snapshot 2026-06-15T03:47ZSimulation
Bivariate Poisson, Dixon-Coles low-score correction, and the 10,000-run Monte Carlo engine; how a match model becomes a tournament distribution.
Contents
From strength matrix to goal rates
The simulation engine is strength-agnostic by design. It does not know which model produced the inputs it receives: M0, M1, M2, or M3. It receives a pair per match and converts that pair into a sampled scoreline. Every claim the engine makes about the World Cup, and every probability the public site displays, descends from how those two numbers get used.
For each match between teams with Elo ratings and , the engine maps the Elo difference to expected goals via the calibration
with the calibration constants
and , fit during Phase 3 against
2010 to 2022 international match results and stored in
evaluation/pre_reg_constants.yaml. The home advantage term
is set to zero throughout the 2026 simulation:
the FIFA 2026 final is played in a single host complex and no genuine
venue-level home advantage applies for most of the tournament. The full
calibration write-up lives in Methodology.
The asymmetry between and is the only place team strength enters the engine. Everything that follows treats as a sufficient statistic for the match.
The Bivariate Poisson match model
Goals scored by the two teams in a match show a small positive correlation, partly explained by shared match conditions (weather, refereeing tempo, pitch state) and partly by the way teams react to the score state of the match itself. To capture that correlation while keeping the model tractable, we use the Bivariate Poisson with a common-shock decomposition (Karlis and Ntzoufras, 2003):
where , , and are independent. is the home goal count, is the away goal count, and is the shared component that induces the positive correlation. Setting recovers the independent Poisson case (Maher, 1982). Setting shifts probability mass into states where both teams score in the same match.
The joint probability mass function is
The shared-shock parameter is locked at throughout the 2026 tournament. The value sits inside the recommended range from Karlis and Ntzoufras's empirical work on European football and is consistent with the small positive cross-team correlation observed in the project's own 347-match corpus. We did not refit during Phase 5: the data is too sparse to identify the parameter precisely against an already-narrow Poisson likelihood, and any post-pre-registration refit would require an OSF amendment.
The PMF is computed once per quantized pair (rounded to four decimal places) and cached, so re-evaluations across thousands of simulations are essentially free. The grid is truncated at ten goals per side; is below for any realistic value the engine encounters. The inner -sum is computed in log-sum-exp form for numerical stability.
The Dixon-Coles low-score correction
The independent-component Bivariate Poisson under-predicts the empirical frequency of , , , and scorelines in international football. Dixon and Coles (1997) addressed this with a multiplicative correction that adjusts the joint PMF only at those four low-score cells:
The corrected joint PMF is
The correction parameter is locked at . The negative sign shifts probability mass into the four low-score cells the independent model under-counts, and out of nearby cells like and . The magnitude is the small adjustment standard in Dixon-Coles applications across European leagues.
We did not jointly refit and on the 2010 to 2022 corpus during Phase 5 because the joint Poisson NLL surface is poorly identified at this sample size; a deferred Phase 6 calibration would optimize them jointly on a richer corpus, under an OSF amendment.
The engine renormalizes the PMF after applying the correction so the cells sum to within . Sampling uses Walker's alias method on the flattened 121-cell PMF vector for draws per match. Match outcome probabilities are read off the corrected PMF directly: is the sum below the diagonal, is the diagonal trace, and is the sum above the diagonal.
Extra time and penalty shootouts
These mechanisms are invoked only in knockout matches when regulation ends level. Group-stage matches that end in a draw simply terminate as draws; nothing further is sampled.
The 30-minute extra period
Extra time is sampled by re-running the same Bivariate Poisson + Dixon-Coles match model with damped goal rates. Each team's 90-minute is multiplied by a factor of and the result is used as the Poisson mean for the 30-minute extra-time period:
The shared-shock is also multiplied by
for coherence, and the Dixon-Coles correction
remains active. The factor was chosen
empirically from a blend of historical World Cup extra-time data and
the Dixon-Robinson (1998) calibration, and it is locked in
pre_reg_constants.yaml. If extra time ends level, the match advances
to a penalty shootout.
The shootout model
The shootout is a sequence of independent Bernoulli kicks with a small Elo-derived skew. The conversion probability for team kicking against team is
with (the historical World Cup penalty conversion rate, 1982 to 2022), as a deliberately small Elo skew, and clipping bounds and to prevent extreme Elo gaps from producing non-physical conversion rates. The corresponding uses the negated Elo difference.
The protocol matches FIFA-compliant shootout rules:
- Five kicks per side, strictly alternating, with team A kicking first in each round.
- After every kick in rounds 1 through 5, a short-circuit check runs: if the running lead is mathematically insurmountable given the kicks remaining, the shootout ends immediately.
- If still tied after five kicks each, the shootout enters sudden death: one kick per side per round, continuing until one team is ahead after equal kicks.
The engine does not model ABBA shooter ordering, shooter-specific skill, or goalkeeper-specific save rates. Shootouts in international football are well-documented as near-random; the small skew is a deliberate choice not to claim more predictive power than the data supports.
The Monte Carlo bracket walker
A single tournament run progresses through three stages. Each run uses
a fresh seeded RNG (numpy.random.default_rng(seed)), and the same
(model_id, data_hash, seed) triple reproduces an identical run
byte-for-byte.
The group stage
Forty-eight teams are split into twelve groups of four (Groups A through L), each playing six round-robin matches for 72 group-stage matches in total. Each match is sampled from the Bivariate Poisson + Dixon-Coles model using the strength provider's
for that fixture in context="group".
Team rankings within a group are determined by the FIFA tiebreaker order, applied recursively when three or more teams remain tied:
- Points (Win = 3, Draw = 1, Loss = 0).
- Goal difference across all group matches.
- Goals scored across all group matches.
- Head-to-head points among the still-tied subset.
- Head-to-head goal difference among the still-tied subset.
- Fair Play points (yellow- and red-card tally). The Phase 5 engine treats this as for all teams; live tracking is wired in Phase 7.
- Drawing of lots, deterministic from the run seed and logged in the output.
When three or more teams are tied on points, criteria 4 and 5 re-scope to the still-tied subset. If the head-to-head pass breaks one team free, the remaining tied teams re-run criteria 4 and 5 among themselves. The implementation is recursive, not single-pass.
The knockout bracket
The 2026 format expands the knockout phase from the historical 16-team Round of 16 to a 32-team Round of 32. Twenty-four of those slots come from the top two finishers in each of the twelve groups; the remaining eight come from the best third-place finishers across the twelve groups, ranked by the same tiebreaker order applied across all twelve third-place teams.
The Round of 32 cross-group pairings are not procedurally generated.
They are fixed by FIFA before the tournament and encoded in the engine
as a hardcoded tuple of sixteen R32Slot records, organized into six
zone pairs (AB, CD, EF, GH, IJ, KL). Each zone pair contributes two
1st-vs-2nd matches and may contribute up to one intra-zone third-place
match, depending on which four of the six zones supply both their
third-place finishers to the best-8. The fifteen possible combinatorial
configurations of which four zones qualify are encoded as a separate
fifteen-entry lookup table.
From the Round of 32, the bracket proceeds R16 → QF → SF → Final, with a separate third-place playoff between the two semifinal losers. Every knockout match begins with a regulation-time sample from the Bivariate Poisson + Dixon-Coles model. If the regulation score is level, the match advances to extra time using the damped- procedure described above. If extra time is also level, the match advances to a penalty shootout. Knockout matches cannot end in a draw.
Aggregation across runs
A single run produces 48 team-result rows and 104 match-result rows. Across runs (10,000 for the website cadence, 100,000 for the academic-paper cadence), the per-team marginals are computed as simple frequencies:
with analogous formulas for , , and similar round-reached probabilities. Confidence intervals follow from the Beta-Binomial conjugate posterior under a flat prior; the live site reports the 90% credible interval alongside each marginal.
Output, performance, reproducibility
A single 48-team tournament simulation completes in approximately 3.5
milliseconds on a single core. The Phase 5 acceptance criterion was 50
ms per simulation; the implementation comes in well under that bound.
A full batch of 10,000 runs across all five model variants (M0, M1,
M2, M3, M★) takes under 15 minutes wall-clock with joblib
parallelism (n_jobs=-2).
Every batch produces two Parquet tables per variant, sharing a
primary-key prefix of
(run_idx, model_id, data_hash, seed, code_sha, timestamp_utc):
team_runs: 48 rows per run.
| Column | Type | Description |
|---|---|---|
team_id | string | FIFA 3-letter code |
group_letter | string | A to L |
group_finish | int8 | 1 to 4 |
group_points | int8 | |
group_gd | int16 | Goal differential |
group_gs | int16 | Goals scored |
qualified_r32 | bool | |
exit_round | string | Group, R32, R16, QF, SF, 3rd, Runner-up, Champion |
reached_final | bool | |
champion | bool |
match_runs: 104 rows per run (72 group + 31 knockout + 1 third-place).
| Column | Type | Description |
|---|---|---|
match_id | string | Group stage: canonical M01..M72 from the fixtures parquet. Knockout: KO-R32-1, KO-3rd-1, KO-Final-1 (synthesized; a follow-up checkpoint adopts the canonical M73..M104 for knockout). |
phase | string | group, R32, R16, QF, SF, 3rd-place, Final |
team_home, team_away | string | |
lambda_home, lambda_away | float32 | null | Input used (audit). Null for settled = true group rows (the realized scoreline was not produced by these lambdas). |
reg_home_goals, reg_away_goals | int8 | |
went_to_ET | bool | |
et_home_goals, et_away_goals | int8 | null | |
went_to_pens | bool | |
pen_home_score, pen_away_score | int8 | null | |
winner | string | null | Null only for group draws |
settled | bool | true if the row is a realized scoreline from match_outcomes (cp-10 conditioning); false if sampled from the BP+DC model. Always false for knockout rows until the knockout follow-up ships. |
The batch manifest (manifest.json) records code_sha, per-variant
data_hash, seed_base, run_count, batch start and end times, and
any failed_runs. The seed-derivation rule
ensures that any reviewer with the same code SHA, data SHA, and batch
timestamp can reproduce the entire batch byte-for-byte. Per-variant
seeds are simply for
. Within each Parquet file,
partitioning by run_idx // 1000 produces ten partitions per file for
cheap incremental reads downstream. Failed runs are logged in the
manifest rather than aborting the batch; the acceptance criterion is
that fewer than of runs may fail.
What the engine does not do
The engine produces final scorelines from goal-rate inputs. It does not produce, by deliberate scope choice:
- In-match dynamics. The engine samples a final score; it does not simulate the 90 minutes minute by minute (no Dixon-Robinson 1998 style time-varying ).
- Injury, red-card, or in-game suspension simulation during a match.
- Referee effects on goal rates.
- Day-of-match weather adjustments to .
- Travel, rest, or fatigue accumulation between matches.
- Home advantage beyond the explicit choice for the FIFA 2026 single-host-complex tournament.
- Correlation between matches in a single run beyond what the bracket structure already enforces. Two matches in the same group share strength inputs but draw their scores independently.
- ABBA shooter ordering, shooter-specific skill, or goalkeeper-specific save rates in shootouts.
- Real-time bracket updates mid-tournament. The engine produces forecasts; the live site re-runs the engine when new data arrives.
Each of these is a deliberate omission, listed honestly so a reader can see the simplifications without reverse-engineering the code.
Where to go next
- Models: the four strength-matrix providers that feed the engine.
- Methodology: the Phase 3 calibration that locked and , and the protocol that locked and before any 2026 forecast was published.
- Evaluation: Brier, log-loss, RPS, CLV, and the Diebold-Mariano machinery that grades the engine's output.
- Pre-registration: the OSF DOI, the signed Git tag, and the sealed
pre_reg_constants.yaml.