§ VI · Reference
last revised 2026-04-22snapshot 2026-06-15T03:47ZGlossary
Every technical term used on this site: formal definition, plain-English equivalent, and a link to the section where it matters most.
Contents
Terms appear in the order a new reader is likely to encounter them: foundational concepts first, technical machinery second, evaluation and audit vocabulary last. Each entry carries a formal definition and a plain-English equivalent. Cross-links point to the Vault article where the concept is used in context.
Jump to: B · C · D · E · G · K · L · M · P · R · S · V
B
Brier score
Formal. The mean squared error between a probability forecast and the binary realization:
where is the model-assigned probability of the realized outcome and for all settled matches (the realized outcome always occurred).
Plain English. A number between 0 and 1 measuring how surprised a model was, on average, by the outcomes it observed. Lower is better. A Brier score of 0.25 corresponds to always forecasting 50/50, regardless of the match.
Where it matters. Transparency Ledger · Model anatomy
C
Calibration
Formal. A forecaster is well-calibrated if, across all predictions assigned probability , approximately fraction of those events occur. Formally, for a partition of predictions into bins :
where is the mean forecast in bin and is the empirical frequency.
Plain English. When a well-calibrated model says "40% probability," the event happens roughly 40% of the time. Calibration is the fundamental honesty property. A model can be calibrated and useless (if it always says 50/50); calibration is necessary but not sufficient for predictive value.
Where it matters. The 45% problem · Ledger reliability diagram
Closing line value (CLV)
Formal. The log-ratio of the model's closing-price-equivalent probability to the actual closing market price:
Plain English. A measure of whether the model's prediction was closer to the true outcome than the market's final price. Positive CLV (in basis points) indicates the model was systematically sharper than the bookmaker's closing line.
Where it matters. Transparency Ledger
Confidence interval
Formal. A 95% Monte Carlo confidence interval on : the interval across 10,000 independent Monte Carlo draws.
Plain English. The range within which the true model probability would fall if we re-ran the simulation 95 times out of 100 with the same inputs but different random seeds.
Where it matters. All probability columns in the terminal.
D
De-vigging
Formal. The process of removing the bookmaker's margin (overround or "vig") from raw decimal odds to recover an implied probability. This project uses the power method:
where is the decimal odds for outcome and is the power parameter estimated to minimize the overround to zero.
Plain English. A bookmaker's raw odds deliberately sum to more than 100% (that excess is the house margin). De-vigging is the arithmetic of peeling the margin back out so the implied probabilities sum to exactly 1.
Where it matters. Divergence terminal · Model anatomy
Dixon-Coles correction
Formal. A multiplicative adjustment to the independent Poisson joint probability that corrects for empirically observed over-frequency of low-scoring draws. See notation for the full definition.
Plain English. The independent Poisson model underestimates 0-0 and 1-1 draws in football by roughly 20 to 30%. The Dixon-Coles correction adds back the missing probability mass for those four low-score cells. Introduced by Dixon and Coles (1997).
Where it matters. Model anatomy
Divergence
Formal. See edge below. In table headers, "divergence" is used interchangeably with edge to avoid over-use of a single word.
Plain English. The gap between what the model thinks a probability is and what the market says it is.
E
Edge (E)
Formal. The signed difference between the model-implied probability and the de-vigged market probability :
Plain English. Positive edge means the model thinks the outcome is more likely than the market does. Negative edge means the model thinks it is less likely. Neither positive nor negative edge is described as "good" on this site. The terminal shows the gap, not a recommendation.
Where it matters. Every row in the divergence terminal.
Elo rating
Formal. A numerical skill rating for a team, updated after each match by:
where is the actual match score (loss/draw/win), is the Elo expected score, and is the update magnitude constant ( for group matches, for knockout rounds in this project).
Plain English. A number that rises when a team beats teams it was expected to beat (by less) and falls when it loses to weaker teams (by more).
Where it matters. Model anatomy
G
Gate
Formal. See volatility gate below.
Plain English. A logical filter applied to each forecast row: if any of
the five gate rules are triggered, the row's gate status changes from OPEN
to FIRED. Gate-fired rows remain visible in the terminal but are annotated
with the triggered rules.
K
Kill criterion
Formal. The pre-registered stopping condition: . See Kill criteria for the full statement.
Plain English. If M★ is calibrated worse than a coin flip by the Round of 16, the project publishes a null result and the terminal is frozen.
Where it matters. Kill criteria · Transparency Ledger
L
Log-loss
Formal. The mean negative log-probability assigned to realized outcomes:
where is the model probability on the realized outcome of match . Lower log-loss is better.
Plain English. A scoring rule that penalizes confident wrong predictions severely. Saying "90% probability" when the opposite outcome occurs incurs a loss of nats; saying "50/50" incurs nats. Log-loss is the primary evaluation metric for this project because it is sensitive to the full probability distribution, not just the modal prediction.
Where it matters. Kill criteria · Ledger
M
M★ (M-star)
Formal. The pre-registered champion model, selected from candidates M1, M2, M3 based on out-of-sample log-loss on 2018 to 2022 international match data, before the 2026 World Cup.
Plain English. The model this project committed to before the tournament started. M★ is identified in the OSF pre-registration and is sealed; it cannot be swapped mid-tournament.
Where it matters. Every probability on this site.
Monte Carlo simulation
Formal. Numerical estimation of the joint tournament outcome distribution by running independent stochastic simulations of the full bracket, using match-level probabilities from M★.
Plain English. Running the tournament ten thousand times in software, tracking who wins each match in each simulated run, and using the frequency of each team winning overall as the estimated championship probability.
Where it matters. All values.
P
Poisson model
Formal. A probability model for football scores that treats goals as independent Poisson random variables. For expected goal rates and :
Plain English. A mathematical model that says goals arrive randomly at a steady rate, as if each second of a match has a tiny probability of producing a goal, and that probability does not change during the match.
Where it matters. Model anatomy
Pre-registration
Formal. A time-stamped, publicly accessible commitment to a study's hypothesis, model specification, data sources, evaluation metrics, and stopping rules, made before data collection begins.
Plain English. Writing down what you will measure and how before you know the result, so readers can distinguish a prediction from a post-hoc rationalization.
Where it matters. Pre-registration
R
Ranked probability score (RPS)
Formal. A proper scoring rule for ordered categorical outcomes that measures the difference between the cumulative forecast distribution and the cumulative empirical distribution:
where is the number of ordered categories (three for 1X2: H, D, A).
Plain English. Like Brier score, but credits partial credit for "almost right" distributions. Assigning 0.40/0.30/0.30 when the away team wins is penalized less than assigning 0.85/0.10/0.05 for the same miss.
Where it matters. Transparency Ledger
S
Snapshot
Formal. A versioned bundle of all data artifacts (tournament.json,
divergence.json, ledger.jsonl, etc.) generated at a single point in
time and identified by a snapshot_id of the form YYYY-MM-DDTHH:MMZ.
Plain English. A frozen copy of everything the site knows at one moment. Every number displayed on this site traces to a specific snapshot, which can be inspected or reproduced.
Where it matters. Every page footer.
V
Volatility gate
Formal. A logical filter with five pre-registered rules. A forecast row's
gate status is FIRED if any rule is triggered; OPEN otherwise. Rules:
(1) named-event exclusion, (2) price-discovery window (48 h before kick-off),
(3) liquidity floor on Polymarket ($50k 24 h volume), (4) maximum line-move
threshold (12%), (5) settlement proximity (2 h before kick-off).
Plain English. Five conditions under which the model's pre-match probability is considered unreliable due to market conditions, not because the math is wrong, but because the market price it is being compared to is noisy. Gate-fired rows remain in the terminal, annotated with the triggered rule.
Where it matters. Divergence terminal