Trueline Tier Legend ← back to picks

What each pick badge means

Calibrated on backtest run_id=13 (1,838 regular-season games, 2025–26). Updated 2026-05-28.

Every pick falls into one tier based on the model's confidence. Each tier is a strict rule on per-model probabilities; calibrated on 1,800 games.

Badge Rule Hit % n / season Notes
ELITE all 4 models agree > 62% 82.9% 35 Rare (~2% of slate) · safest tier
STRONG any 2 of 3 primary models > 60% 62.0% 234 Workhorse · most volume
VALUE all 3 primary models > 55% 64.7% 153 Best ROI tier · +odds friendly · parlay-grade
TOTALS-LOCK LR ≥ 0.62 and sim ≥ 0.62, same side 68.2% 107 D2 consensus · highest confidence band
TOTALS-STRONG LR ≥ 0.60 and sim ≥ 0.55, same side 65.7% 277 D2 consensus · workhorse totals band
TOTALS-VALUE LR ≥ 0.58 and sim ≥ 0.52, same side 58.4% 197 D2 consensus · wider band, parlay-grade
TOTALS-LR-SOLO LR ≥ 0.66, sim disagrees 0 Forward-compatible · 0 picks across n=3,638 games to date
PASS no tier rule matches ~50% Collapsed by default · add manually if your read differs

How to read tier colors: green = LOCK band (~9–10/10 confidence), amber = STRONG, blue = LEAN/VALUE, grey = PASS.

Lineup gate: predictions only appear once both starting lineups are posted. Games waiting on lineups show ⏳ lineups in the left column — usually 2–3 hours before first pitch. For future-dated slates, predictions show with a ⏳ tentative roster badge — they use probable pitchers + team strength only (no lineup data yet).

Models behind the picks: analytic generative (Negative-Binomial run distribution), simulation Monte-Carlo (2,000 sims/game), LR classifier (elastic-net), and XGBoost (gradient-boosted trees). ELITE requires all four; STRONG / VALUE require the three primary voters (XGB optional). On totals, the user-facing tier is the new D2 LR + sim consensus rule — both voters must agree on side at the threshold; LR is the peaked-accuracy anchor and sim is the agreement check.

Sim is now multi-line (D1): sim emits a probability at every standard market line 7.5 → 11.5 in 0.5 increments — matching LR's classifier coverage. The 8.5 / 8.0 / 11.0 / 11.5 calibrators are persisted (isotonic, refuse-if-worse gate); 7.5 / 9.0 / 9.5 / 10.0 / 10.5 currently use sim's raw probability because the calibrator didn't improve held-out log-loss.

Why fewer non-8.5 ensemble totals (D3): at non-8.5 lines, only sim emits — there's nothing to average against, so the "ensemble" is just sim's raw probability. To prevent single-voter overconfidence at the tails, the ensemble blender requires ≥ 2 voters per line; below that it emits nothing. Net effect: ensemble Total Brier 0.2454 → 0.2451 with the gate on, and the Δ +0.13 / +0.22 overconfidence at P ≥ 0.60 went to within ±0.04.

Headline lift from Wave 1 + D1 + D2 + D3 (n=1,800): totals consensus tier hit-rate 62.5% → 63.7% · ROI at −110 +19.4% → +21.6% · ensemble Brier 0.2340 → 0.2316. ML side: essentially unchanged (only C8 batter recency feeds into ML and the n=1,800 delta is +0.0001 — noise).

Full model card with Brier progression, totals verification, and EV-backtest tables: /model-card