§ VII · Long-form
11 min readlast revised 2026-04-22snapshot 2026-06-15T03:47ZVolatility Gate
Five pre-registered suppression rules: news events, price discovery, exchange spread, liquidity floor, and sizing guardrails. Each rule has a counterfactual.
Contents
From flagged edge to recommended stake
The market layer hands the volatility gate a flagged edge. The gate decides whether the surrounding microstructure is stable enough for that edge to mean anything. If the five suppression rules all pass, the Kelly sizer turns the edge into a recommended stake fraction. If any rule fires, the row is written to the log with a reason code and a zero stake.
Three architectural layers live on this page. The gate (five rules in
market/volatility_gate.py) decides whether to act. The sizer (Kelly
plus per-market caps in market/kelly_sizer.py) decides how much. The
orchestrator (market/market_pipeline.py) enforces cross-market caps
that no single gate or sizer call could see by itself. All three layers
must clear before a non-zero stake fraction reaches forecast_log.jsonl.
The page treats them as one guardrail because that is what the user
actually experiences: a flagged edge from the Market Layer
either becomes a recommended stake or it does not, and the reason code
explains why. Every threshold below is sealed in
evaluation/pre_reg_constants.yaml and was registered on OSF before any
2026 forecast was published.
The five suppression rules
The five gate rules are applied in fixed order. The first rule to fire writes its reason code to the gate output and short-circuits the cascade; the remaining rules are not evaluated. The fixed order matters for auditability: a flag suppressed by Rule 1 (information event) is qualitatively different from one suppressed by Rule 5 (microstructure failure), and the cumulative tally of suppressions by reason is itself a research output for Evaluation.
Rule 1: named-event suppression
The gate suppresses the flag if a gate-relevant news event involving either participating team occurred within six hours before the snapshot timestamp:
Reason code: NAMED_EVENT_6H. The window is sealed in
pre_reg_constants.yaml::gate.news_window_hours. Events older than six
hours are still logged for evaluation reconstruction but do not fire the
rule.
What counts as a gate-relevant event is a closed enum, sealed in the
pre-registration. Six categories: INJURY, SUSPENSION,
MANAGER_CHANGE, SQUAD_CHANGE, VENUE_CHANGE, MATCH_RESCHEDULE. A
seventh category, OTHER_MATERIAL, is logged for analysis but does not
fire Rule 1.
The news monitor (market/news_monitor.py) consumes only structured
federation feeds: FIFA's official feed, the 48 participating member
federations' RSS feeds, and the six confederation feeds (UEFA, CONMEBOL,
AFC, CAF, CONCACAF, OFC). Tier 1 sources are polled every 5 minutes;
Tier 2 confederation feeds every 15 minutes for cross-confirmation. No
social media, no headline scrapers, no aggregators. The trade-off is
deliberate: federation feeds are slower than X or BBC Sport but
structured, attestable, and free of adversarial noise. A single false
NamedEvent that suppresses a flagged edge is more costly to the
project's integrity than any number of missed real events.
A circuit breaker isolates the news layer from upstream feed failures. Three consecutive fetch failures from a source open the breaker for 30 minutes; during that window the source is skipped and logged. Other sources continue polling unaffected.
Rule 2: intra-book price discovery
The gate suppresses the flag if the Pinnacle de-vigged probability on the flagged leg moved by more than three percentage points in the prior 30 minutes:
Reason code: PRICE_DISCOVERY_INTRA_BOOK. The threshold lives in
pre_reg_constants.yaml::gate.price_discovery_pct; the 30-minute window
is gate.price_discovery_window_min.
The intuition: a 3 pp move on Pinnacle in 30 minutes is the signature of sharp money entering the line. The "edge" the model thinks it found might just be the model running ahead of the market by a few minutes, which is not an exploitable advantage. By the time a bet would settle, the market will have absorbed whatever information moved it.
Rule 3: cross-book price discovery
The gate suppresses the flag if the de-vigged probability on the flagged leg differs between Pinnacle and Betfair Exchange by more than 2.5 percentage points:
Reason code: PRICE_DISCOVERY_CROSS_BOOK. Threshold:
pre_reg_constants.yaml::gate.cross_book_spread_pp.
Pinnacle and Betfair Exchange are both sharp markets but they price slightly differently. A normal spread between them is well under 2.5 pp. A spread above 2.5 pp means at least one of them is mid-correction, and the model has no honest way to decide which side of the consensus is correct.
This rule shares its reason-code naming convention with the upstream
Z_OUT_OF_BAND bypass path: when the Power-method exponent on Pinnacle
falls outside its calibrated band, the orchestrator emits SUPPRESSED
with reason_code=Z_OUT_OF_BAND directly, without invoking the
five-rule cascade. That bypass is described in §2.6.
Rule 4: Polymarket liquidity floor
The gate suppresses the flag if Polymarket's 24-hour traded volume on the equivalent contract is below USD 50,000:
Reason code: LIQUIDITY_POLYMARKET_LOW. Threshold:
pre_reg_constants.yaml::gate.liquidity_floor_usd.
Polymarket's price reflects the consensus of liquid prediction-market participants. Below USD 50,000 in 24-hour volume, the quoted price is a thin signal that any single mid-sized order could move several percentage points. A "model edge" against a thin Polymarket quote is a measurement artifact of the thinness, not a tradeable advantage.
Rule 5: Pinnacle staleness
The gate suppresses the flag if Pinnacle's most recent quote update predates the snapshot timestamp by more than four hours:
Reason code: LIQUIDITY_PINNACLE_STALE. Threshold:
pre_reg_constants.yaml::gate.pinnacle_staleness_hours.
A stale quote means Pinnacle has not updated its line in response to information that has likely arrived since. Comparing the model probability against a stale book is comparing today's information against yesterday's price; whatever divergence appears is not a mispricing because the price has not yet had a chance to converge.
First-rule-wins composition
The five rules are applied in the order Rule 1 to Rule 5. The first rule
to fire writes its reason code to the GateDecision and short-circuits
the cascade; the remaining rules are not checked. This ordering is
load-bearing for forensic reproducibility. Every gate decision in
gate_log.jsonl carries the reason_code of the first rule that
suppressed it, plus the rule_inputs object recording the actual
numerical values that drove the decision (e.g. for Rule 2: the current
de-vigged probability, the 30-minute-prior probability, the absolute
delta in pp, and the threshold the delta was compared against).
A flag that survives all five rules emits gate_status=PASS with
reason_code=NONE and is forwarded to the Kelly sizer. A flag that is
suppressed emits gate_status=SUPPRESSED with the firing rule's reason
code, is logged, and contributes a recommended stake fraction of zero to
forecast_log.jsonl.
The exceptional bypass path: when the Market Layer
detects a Z_OUT_OF_BAND Power-method exponent on Pinnacle, the
orchestrator emits SUPPRESSED with that reason code directly and
bypasses the five-rule cascade. The decision is logged in the same
format, but the rule_inputs object carries the out-of-band value
rather than any of the five normal rule inputs. This is the only path
in the pipeline that produces a SUPPRESSED decision without one of the
five rules firing.
Fractional Kelly sizing
If the gate clears, the Kelly sizer computes a recommended stake fraction. The math is fractional Kelly with hardcoded caps.
The full Kelly fraction
For a bet at decimal odds with model probability , the full-Kelly stake fraction is:
if and only if , which is the positive-expected-value condition. Full Kelly maximizes the expected log-growth rate of bankroll under the strong assumption that is known exactly. We are not in that regime: carries non-trivial estimation error from a 347-match calibration corpus, and full Kelly is empirically catastrophic at our . It is mean-variance dominated by fractional Kelly for any realistic .
Mainline vs longshot fractions
The recommended stake fraction is full Kelly multiplied by a market-class fraction and clipped to a per-market hard cap:
The class is determined by the de-vigged book probability on the flagged leg, not by the model probability:
| Class | Trigger | Per-market cap | |
|---|---|---|---|
| Mainline | |||
| Longshot |
The fraction on mainline markets is the Thorp (1997) and MacLean, Thorp, and Ziemba (2010) recommendation for the regime where is known with non-trivial error. The fraction on longshot markets is half that, reflecting two facts. The Kelly variance scales as for fixed edge, so longshot variance is structurally larger. And the Monte Carlo standard error on tail outcomes is itself larger because tail events are sampled fewer times in 10,000 runs.
The classification is on for a specific reason. If the trigger were , a model with high conviction on a tail outcome could re-classify its own bet out of the longshot bucket and then take quarter-Kelly on a high-variance 8% probability event. Anchoring the bucket to the de-vigged book probability prevents that self-promotion.
Per-market caps
The per-market cap is a strict hard ceiling applied after the fractional multiplier. Even when the fractional-Kelly arithmetic returns a larger fraction (a very large edge can produce , and quarter-Kelly takes that to above ), the clip enforces the cap:
The asymmetric cap (5% of bankroll on mainline, 2.5% on longshots) is deliberate. Even with the smaller , longshots can produce extreme values when the model finds a large mispricing, and the second clip prevents a single 1%-probability bet from sitting at 4% of bankroll on a few large draws.
Bankroll machinery
The Kelly sizer's per-market caps are not the last word. Two more layers
constrain the recommended stake before it reaches forecast_log.jsonl:
a drawdown state machine on the bankroll itself, and per-event and
per-day caps applied by the orchestrator.
The drawdown state machine
The bankroll has three modes: NORMAL, HALVED, STOPPED. Transitions
are mechanical and pre-registered:
| Transition | Trigger |
|---|---|
NORMAL to HALVED | |
HALVED to NORMAL | |
HALVED to STOPPED | a second distinct drawdown episode begins below 0.80 while still in HALVED mode |
STOPPED to NORMAL |
Triggers are sealed in pre_reg_constants.yaml::kelly.drawdown_halve_trigger
and kelly.drawdown_recover_trigger. In HALVED mode every recommended
stake fraction is multiplied by :
In STOPPED mode every recommended stake is zero.
Crucially, peak_bankroll is not re-anchored on partial recovery. If
the bankroll falls from peak to then recovers to
, the peak remains and the drawdown is still 15%. Only
when the bankroll reaches does the mode revert to NORMAL,
and even then the peak stays at . This prevents a bankroll that
grinds sideways from quietly drifting peak downward and giving false
confidence that drawdowns are shallower than they are.
STOPPED rows are still logged to forecast_log.jsonl, with
recommended_stake_fraction=0 and bankroll_mode=STOPPED. The reason:
the Evaluation page's pseudo-CLV computation needs
to compute hypothetical CLV on what would have been bet, which requires
the row to exist even when the recommended stake was zero.
Per-event and per-day caps
Two further caps are enforced at the orchestrator level (in
market/market_pipeline.py), because they require state across markets
that no single gate or sizer call can see:
| Scope | Cap | YAML key |
|---|---|---|
| Per event (all legs of one match) | kelly.cap_per_event | |
| Per day (all events on one calendar day, UTC) | kelly.cap_per_day |
When either cap fires, the orchestrator does not modify the original
forecast_log.jsonl rows. The log is append-only: original rows are
written first as provisional, and adjustment rows are then appended
referencing the parent via parent_decision_id, with
cap_adjustment=True and applied_caps += [per_event_cap | per_day_cap].
Each adjustment row carries the same fields as the original but with
the recommended stake scaled to bring the total below the cap.
This two-pass discipline preserves the audit trail. A reviewer scanning the log can see the original sized recommendation and the cap adjustment as separate rows, both timestamped, both linked, neither deleted.
Append-only logging and forensic reproducibility
The gate_log.jsonl file is the canonical source of truth for gate
behaviour. Three properties make it forensically defensible.
First, it is opened in append mode ("a") and never in write mode
("w"). Every gate decision is appended; nothing is ever overwritten or
deleted. A row that turns out to be wrong stays in the log next to the
adjustment row that corrects it, both visible.
Second, every appended row carries the structured rule_inputs object
with the actual numerical values that drove the decision. For a Rule 2
suppression: the current Pinnacle de-vigged probability, the
30-minute-prior probability, the absolute delta in pp, and the threshold
the delta was compared against. A reviewer can reconstruct the rule
firing without re-running the upstream pipeline.
Third, after every flush, the SHA-256 of the entire log file is
recomputed and written to gate_log.jsonl.sha256 as a sidecar. A
tampered or truncated log produces a SHA mismatch the next time the
sidecar is recomputed, surfacing the corruption immediately.
The same gate decisions are also written into the master
forecast_log.jsonl that the evaluation layer consumes and that the
live ledger reads. gate_log.jsonl is the authoritative gate-behaviour
record; a divergence between the two files would indicate a pipeline
bug, not a difference in semantics.
What this layer does not do
The boundary statement.
- No de-vigging or edge calculation. Both live in the Market Layer.
- No Pinnacle bias correction (, ). Lives in the market layer.
- No CLV or evaluation metrics. Lives in Evaluation.
- No real bet execution, no broker integration. The recommended stake fraction is theoretical and feeds the pseudo-CLV metric, not a live account.
- No social-media, headline-aggregator, or unstructured news consumption. Federation feeds only.
- No live re-fitting of any threshold (six-hour news window, 3 pp / 30 min, 2.5 pp cross-book, USD 50,000 Polymarket floor, four-hour staleness, and Kelly fractions, 5% and 2.5% per-market caps, 8% per-event, 15% per-day, 80% and 90% drawdown trigger and recovery). All values are sealed; modification requires an OSF amendment before next kickoff.
- No mid-tournament addition of new gate rules. The cascade is fixed at five rules.
- No ABBA shooter ordering, shooter skill, or other shootout-mechanism details. Shootouts are handled in Simulation.
- No nightly behavioural-bias testing. The seven pre-registered bias hypotheses live in
market/bias_tests.pyand run againstforecast_log.jsonlas an evaluation activity. The Evaluation page is their home.
Where to go next
- Market Layer: the de-vigging, edge metric, and flagging machinery that produces the
EdgeFlagthis page consumes. - Evaluation: Brier, log-loss, RPS, the Diebold-Mariano machinery, and the Closing Line Value tests that grade the outputs of this layer.
- Kill criteria: the formal mathematical statement of the kill criterion and the live status block.
- Pre-registration: the OSF DOI, the signed Git tag, and the sealed
pre_reg_constants.yamlthat locks every threshold on this page. - Notation: the symbol table for , , , , and related quantities.