What each pick badge means

Calibrated on backtest run_id=13 (1,838 regular-season games, 2025–26). Updated 2026-05-28.

Every pick falls into one tier based on the model's confidence. Each tier is a strict rule on per-model probabilities; calibrated on 1,800 games.

Badge	Rule	Hit %	n / season	Notes
ELITE	all 4 models agree > 62%	82.9%	35	Rare (~2% of slate) · safest tier
STRONG	any 2 of 3 primary models > 60%	62.0%	234	Workhorse · most volume
VALUE	all 3 primary models > 55%	64.7%	153	Best ROI tier · +odds friendly · parlay-grade
TOTALS-LOCK	LR ≥ 0.62 and sim ≥ 0.62 — or LR ≥ 0.66 where sim is silent	63.4%	41	Highest band · sim-covered + extended 9 / 9.5
TOTALS-STRONG	LR ≥ 0.60 and sim ≥ 0.55 — or LR ≥ 0.60 where sim is silent	55.1%	107	Workhorse band · ≈ VALUE within sampling noise
TOTALS-VALUE	LR ≥ 0.58 and sim ≥ 0.52 — or LR ≥ 0.58 where sim is silent	57.3%	75	Wider band · parlay-grade
TOTALS-LR-SOLO	LR ≥ 0.66, sim actively disagrees	—	0	Sim-dissent path · 0 picks to date
PASS	no tier rule matches	~50%	—	Collapsed by default · add manually if your read differs

How to read tier colors: green = LOCK band (~9–10/10 confidence), amber = STRONG, blue = LEAN/VALUE, grey = PASS.

Lineup gate: predictions only appear once both starting lineups are posted. Games waiting on lineups show ⏳ lineups in the left column — usually 2–3 hours before first pitch. For future-dated slates, predictions show with a ⏳ tentative roster badge — they use probable pitchers + team strength only (no lineup data yet).

Models behind the picks: analytic generative (Negative-Binomial run distribution), simulation Monte-Carlo (2,000 sims/game), LR classifier (elastic-net), and XGBoost (gradient-boosted trees). ELITE requires all four; STRONG / VALUE require the three primary voters (XGB optional). On totals, the user-facing tier is consensus where the sim has a calibrated read (8 / 8.5) — both voters must agree on side — extended to the common book lines the sim doesn't cover (9 / 9.5 / 7.5) by the well-calibrated LR alone. LR is the peaked-accuracy anchor; the sim is the agreement check where it can read.

Totals at the live line: the tier is computed at the book's current posted total. LNK moves the line over time (e.g. 9.0 → 9.5) and the old line is no longer bettable, so a line move re-runs the tier at the new line; a price-only tick (same line, new juice) does not. Picks at sim-silent lines (9 / 9.5 / 7.5) hit +7.5% ROI in backtest — profitable, below the +12.8% sim-covered consensus picks, which is why the consensus is kept where it reads.

Sim is now multi-line (D1): sim emits a probability at every standard market line 7.5 → 11.5 in 0.5 increments — matching LR's classifier coverage. The 8.5 / 8.0 / 11.0 / 11.5 calibrators are persisted (isotonic, refuse-if-worse gate); 7.5 / 9.0 / 9.5 / 10.0 / 10.5 currently use sim's raw probability because the calibrator didn't improve held-out log-loss.

Why fewer non-8.5 ensemble totals (D3): at non-8.5 lines, only sim emits — there's nothing to average against, so the "ensemble" is just sim's raw probability. To prevent single-voter overconfidence at the tails, the ensemble blender requires ≥ 2 voters per line; below that it emits nothing. Net effect: ensemble Total Brier 0.2454 → 0.2451 with the gate on, and the Δ +0.13 / +0.22 overconfidence at P ≥ 0.60 went to within ±0.04.

Totals coverage + ROI (run 28, @ posted line): coverage now spans the full common book range — consensus at 8 / 8.5 plus the LR-extended 9 / 9.5 / 7.5 lines. Sim-covered consensus picks hit +12.8% ROI at −110; the extended sim-silent picks +7.5%; blended +9.6% across ≈ 2.5× the volume of consensus-only (which PASSed every 9 / 9.5 game). Prior ensemble-calibration lift (Wave 1 + D1–D3) held Brier at 0.2340 → 0.2316; ML side essentially unchanged.

Full model card with Brier progression, totals verification, and EV-backtest tables: /model-card