a discord bot for the tootsies server. ask, recap, discuss, ship features by typing.
A thorough, edge-case-oriented test plan for the two highest-risk surfaces: the Bookie (play-money betting — a money path) and the live sports Commentator. Grounded in the actual code + existing tests (2026-07). Legend: ✅ covered (cites the test), ⚠️ gap (no direct test), 💰 money-critical (mis-pay / double-pay / strand risk).
| Tier | What | When | Applies to |
|---|---|---|---|
| T1 · deterministic unit | pure logic, mocked DB/feeds, no network | every CI run (pytest) |
the bulk below |
| T2 · dry-run vs live data | exercise the real code path against a probed live API / the real _settle_once |
before merging a money/feed change | settlement, slate, reprice, name folds |
| T3 · live prod verification | Railway EVENT logs + /debug/query + /debug/integrations |
after deploy of a money/feed change | settle heartbeat, real payouts, feed health |
| T4 · behavioral eval | model-judged output (scripts/eval_commentate.py) |
on demand / scheduled | commentary voice + grounding |
Rule: every 💰 item earns T1 + at least one of T2/T3. Green unit tests alone never ship a settlement change — the money path is confirmed against live data before and after.
place_bet, db.place_bookie_bet)| # | Scenario | Expected | Status |
|—|—|—|—|
|1|Valid pregame bet|deduct stake, insert open, public embed, bet_placed ok=True|✅ test_bet_success_places_and_posts_publicly|
|2|Defer before slow fetch|defer() before _bettable_games() (no 10062)|✅ test_bet_defers_before_slow_fetch|
|3|💰 Insufficient funds|place_bookie_bet→None, ephemeral, balance never negative|✅ test_bet_insufficient_funds_is_ephemeral|
|4|Non-positive stake (0 vs negative)|rejected at cog + DB double-guard|✅ test_bet_rejects_non_positive_stake (⚠️ 0-vs-neg not split)|
|5|Game gone from slate|game_gone|✅ test_bet_game_no_longer_available|
|6|Side no longer priced|game_gone|✅ test_bet_unpriced_side_rejected|
|7|💰 Market locked/settled|is_bookie_game_locked live read rejects|✅ test_bet_rejected_when_market_locked|
|8|Live/started game|still bettable at live line, tagged LIVE|✅ test_bet_accepted_when_game_started|
|9|💰 Live-placed settles like pregame|no settlement special-casing|✅ test_live_placed_bet_settles_like_a_prekickoff_one|
|10|Draw side (3-way soccer)|side='draw' stored|✅ test_bet_on_draw_stores_and_posts|
|11|Disabled guild (kill switch / experiment off)|_DISABLED|✅ two tests|
|12|Sport not bettable for guild|filtered → reads as game_gone|⚠️ not asserted through place_bet|
|13|💰 Daily top-up fires inside placement|balance floor applied before funds check|⚠️ gap|
|14|💰 Concurrent double-placement / double-top-up|txn + IS DISTINCT FROM CURRENT_DATE; balance never negative|⚠️ gap (no live-DB test)|
|15|💰 decimal_odds snapshot at placement|payout uses stored odds, not re-fetched|⚠️ partial (no “line moved after placement” test)|
|16|💰 Degenerate side (<1%)|whole market unbettable (_sides→[])|✅ test_sides_drops_degenerate_line|
_fetch_bettable, autocompletes)Cold/stale/empty cache, freshness row, sport filter (_resolve_pick_sport), per-guild bettable filter, dedup by match_key (richer line wins, live breaks tie), locked-games + API-Sports-finals veto (both fail-open), SGO 2-day upcoming merge, sort order, _game_when tags, round threading. Mostly ✅. Gaps: ⚠️ 2s cold-timeout “loading” branch; ⚠️ 💰 _bettable_for_guild DB-error fails SAFE to {football,basketball} (never over-offers); ⚠️ 💰 doubleheaders sharing one dateless match_key → assert “missed offering, not double-pay”; ⚠️ name-truncation at _NAME_CAP.
_settle_game, settle_bookie_game, bookie_bet_outcome, _winning_team) 💰| # | Scenario | Expected | Status |
|—|—|—|—|
|1|Home/away decisive|winner→won round(stake*odds), loser→lost|✅ test_bet_outcome_pure|
|2|Draw on 3-way|team LOSES, draw-side WINS|✅ test_settle_tick_passes_draw_on_tie|
|3|Draw on 2-way (MLB/NHL/NFL/MMA)|PUSH/refund|✅ test_bet_outcome_two_way_tie_pushes|
|4|Single-leg knockout level → advancer|knockout_advancer pays the team that advanced|✅ test_knockout_shootout_settles_to_advancer_end_to_end|
|5|Knockout level, no authoritative advancer → ‘draw’ → 2-way push|safe void|✅ test_settle_game_soccer_knockout_level_result_pushes|
|6|Missing score → void+refund|_winning_team→None|✅ test_bet_outcome_pure|
|7|💰 Idempotency / double-settle|second run returns [], pays nothing|⚠️ GAP — highest money priority|
|8|Settle-loop vs commentator race|settle_and_roast_blob reads settled rows|✅ test_settle_and_roast_blob_reads_settled_bets|
|9|Name folds: FIFA code / US nickname / MMA fighter|reconcile across providers|✅ 3 tests|
|10|💰 Label matches NEITHER final team → VOID not LOST|safety floor|⚠️ partial — wants an explicit case|
|11|match_key order-independence|home/away flip → one key|✅ test_match_key_is_provider_and_orientation_independent|
|12|Resolver: API-Sports primary vs Odds-API backstop (degraded)|both idempotent, no double-pay|✅ test_settle_backstop_pays_out_via_odds_api_when_api_sports_down|
|13|MMA excluded from Odds-API backstop|never wastes a credit|✅ test_odds_api_settles_team_sports_only_not_mma_or_tennis|
|14|Per-sport routing; tennis→() never offered|settlement_sources|✅ 2 tests|
|15|Backstop: only completed + open-bet-scoped|_odds_api_finals|✅ test_settle_backstop_only_completed_games_and_scoped_sports|
|16|Skip fetch when no open bets|has_open gate|✅ test_settle_tick_skips_fetch_when_no_open_bets|
|17|💰 Sport disabled AFTER placement still settles|keyed on open bets, not menu|⚠️ partial — wants explicit test|
|18|open_bet_sports read fails → fail-open|open_sports=set(), baseline still fetched|⚠️ gap|
|19|💰 Neutral-venue orientation|settle by team NAME not position|⚠️ partial|
SGO-down → Odds-API slate backstop (_supplement_with_odds_api, scoped to bettable union, MMA fold-only); SGO-healthy → zero metered calls; API-Sports-down → Odds-API settlement backstop; live reprice off prediction markets (_supplement_with_live_predictions, Polymarket→Kalshi, SGO self-priced untouched, force_refresh idle-live refold); durable slate + stale-fallback (_MAX_STALE_SECS=15m); WC-round daily-cached fallback. Extensively ✅ (14 reprice tests, 5 durable-slate tests, 4 Odds-API-fallback tests). Gaps: ⚠️ both-down combined path; ⚠️ _deserialize_slate field-drift skip; ⚠️ 💰 96h void does NOT fire early on a ~47h-pre-kickoff bet still in progress; ⚠️ 💰 UTC-boundary two-date finals scan (the Ghana-strand: ET-evening game finishing after UTC-midnight still settles — providers.recent_finals).
_enabled (experiment + kill switch) on every command + blob; bettable-toggle default {football,basketball} vs set; watched-sports independence; _bettable_sports_union fail-SAFE to default; settlement runs regardless of gate (bookkeeping), only the callout is gated. Mostly ✅. Gaps: ⚠️ set_bettable_sports empty→null; ⚠️ 💰 _bettable_sports_union per-guild-error → default branch.
cog_load migration, _bettable_games, locked/finals vetoes, per-sport supplement, _flag_stranded, _announce (money already credited, ping never retried), _settle_once top-level (ok=False+emit_error(recoverable=False)), blobs→"", one-time corrections. Mostly ✅. Gaps: ⚠️ _announce send-exception branch; ⚠️ WC-round cache read/write except; ⚠️ _action_game_autocomplete error branch.
bet_settle_tick heartbeat every tick (ok), ops-monitor settlement_broken/settlement_stalled/bets_stranded(MAX not SUM)/payout_undelivered, bet_settle_backstop, bet_picker disposition, bet_slate_refresh, bet_placed reasons, bookie_live_reprice (implied %s). Extensively ✅. Gap: ⚠️ bets_settled stale_void count emit.
/balance public + daily floor top-up; 💰 top-up counts LOCKED open-bet stake (no free-refill-while-big-bets-ride) — ⚠️ gap, money-critical; /grant one/all (never mass-seeds), mod-gated; /bets open/settled/other-user (read-only peek, no seed) + pager; bankroll math ties to balance; game-collapsed record; /action; /leaderboard bets by net. Mostly ✅.
_sides / draw-suppression choke point 💰3-way includes draw (soccer) / suppressed for non-3-way (WNBA OT strand); single-leg knockout draw suppression + renormalize; two-legged CL keeps draw, single-leg final suppresses; _drop_if_degenerate (one degenerate side drops the whole market); bool-odds rejection; WC-round threading determinism. Extensively ✅ (~15 tests).
Accent-fold (Türkiye, not deleted); canonical_team FIFA/US/passthrough (A.C. Milan not split); US spacing/relocation/nickname (ambiguous “Sox” dropped, per-sport collision); canonical_fighter; match_key_parts order-independent; sources-matrix invariants (settleable==BETTABLE_SPORT_KEYS, drop-in feed). Extensively ✅. Gaps: ⚠️ Korea/Congo ambiguous fall-through explicit case; ⚠️ country_codes.json corrupt→norm-only degrade; ⚠️ 💰 score_to_snapshot NHL “hockey” tag (a wrong tag silently strands every NHL backstop settlement).
decide_post, cadence.py)Pregame once; final once on transition; halftime on entry; soccer goal on score-change (event-count-aware: flap-suppression, VAR retraction, score-field fallback); kickoff 0-0 not a goal; NBA period boundary (not mid-quarter, no phantom first-obs); interval heartbeat on phase-staggered grid; clutch tightening; first-live-sight immediate; per-game phase offset; concurrent-game anti-collision; milestone-inside-period suppresses that period’s read then grid resumes; PostState restart-persistence. Extensively ✅ (~30 cadence tests). Gaps: ⚠️ halftime re-entry suppression; ⚠️ Q4→OT boundary (also CLUTCH); ⚠️ nba_every_period=False; ⚠️ _score_changed both-None; ⚠️ two same-team different-event_id games keep independent PostState.
classify_phase)Completed→FINAL wins; football pregame/HT/in-game/clutch(≥75’/ET/BT/P); basketball NS/Q/OT/HT; generic SGO (no code)→live/completed/pregame. ✅. Gaps: ⚠️ HT-before-clutch precedence (HT + elapsed 80 → HALFTIME); ⚠️ malformed/absent elapsed→IN_GAME.
Leaders (NBA-critical), team stats (soccer shape), pregame (formations/form/H2H, pregame trigger only), standings (pregame+final, soccer-only), player props (matched to leaders/scorers, SGO-mapped), market-edge (slow triggers only, SGO board or Odds-API-fallback when SGO down w/ budget guard). Formatters extensively ✅; _props_blob/_market_edge_blob fail-open ✅. Gaps: ⚠️ cog-level fail-open for the 5 game-depth fetchers (_leaders/_team_stats/_pregame/_standings/_events) individually; ⚠️ standings on the pregame bookend (only final tested); ⚠️ market-edge suppressed on goal/clutch/final (not asserted at tick level); ⚠️ Odds-API-fallback per-game TTL cache (incl. caching a “” miss).
Matched sportsbook line; NO odds block → forbid mentioning a line (no_odds_rule, #507); pre-match Odds-API relabeled OPENING on in-play/final (_ODDS_OPENING_TRIGGERS, not pregame); SGO line never relabeled; prediction markets stay live; cross-provider accent-fold match; resolved-market guard (_MIN_LIVE_SIDE_PROB); near-lock reframe; odds-divergence health. Leaf helpers extensively ✅. Gaps: ⚠️ cog-level _emit_odds_health wiring (assembling per-source %s, <2-source early-return, try/except); ⚠️ _odds_blob has_sgo_line enriched-game branch; ⚠️ unit assert that no_odds_rule is present iff no odds.
commentate_score, _passes_quality)Score ≥ _SCORE_FLOOR(0.6) ships, below drops (no must_post); fail-open on scorer error → ship; fed the live feed (facts) to verify numbers; empty compose = miss (short-circuits scoring). ✅. Gaps: ⚠️ a depth blob (team stats / match rating) actually reaches facts so it isn’t false-failed (only scoreline asserted); ⚠️ truncation / two-sentence-ceiling guard; ⚠️ URL-strip on a commentary line.
1 routine post/tick, rest defer (state untouched, re-fire) → commentary_deferred; milestones (goal/final) never deferred but count budget; per-post header dedup (only on game/score change); header delivery-only (post_preview stays the take). ✅. Gaps: ⚠️ milestone-counts-toward-budget so a routine read defers around a goal (same tick); ⚠️ assert post_preview excludes the header.
_resolve_targets)kill switch → mood OFF → experiment (prod→room / staging→#bot-logs / off→drop) → watched-sports filter (None=all, untagged fail-open); no-targets skips hub call; hub-unavailable short-circuits. ✅. Gaps: ⚠️ empty channel-list continue; ⚠️ untagged-game fail-open at cog level; ⚠️ per-guild tunable cadence actually alters the decision (test bot always defaults); ⚠️ SGO watched_union narrowing passes the right frozenset (or None when a guild watches all).
_process_pending_highlights, highlightly.py)Finished game marked pending (persisted matchup); pending pass before the live-games early-return; posts first VERIFIED clip once, marks done (posted_ts only on real send); no-clip records check-only; 0-48h window + 60-min recheck (SQL); gated on client + live-target guild. Cog-level ✅. Gaps: ⚠️ the SQL window/pace/dedup (due_highlight_games) is mocked — untested; ⚠️ staging highlight audition path; ⚠️ _highlights_for fail-open + send-failure-doesn’t-mark-posted; confirm _parse_highlights/_capture_usage live in test_highlightly.py.
Tick catch-all; per-depth-fetch fail-open; compose 429 / raised exception → “” + emit_error(recoverable=True) (treated as empty miss); state-persist failure swallowed; send-failure per channel (state recorded, emits nothing); milestone retry loop (_MAX_MILESTONE_ATTEMPTS=3, _RETRY_TRIGGERS). Gaps: ⚠️ tick-level exception swallowed; ⚠️ compose-429 path (only “EMPTY”-return tested, not a raised RateLimitError); ⚠️ send-failure / state-persist-failure; ⚠️ retry for pregame/goal/halftime (only final tested) + attempt-counter clear on ship.
_bets_blob)pregame→open_bets_blob, final→settle_and_roast_blob, else “”; no-Bookie-cog → “”; guild-scoped roast; AllowedMentions.none(). ✅ routing. Gaps: ⚠️ fail-open on bookie exception → “”; ⚠️ has_bets flag; ⚠️ final-settle idempotency with the Bookie settle loop (no double-pay — cross-references §1.3.7).
Tier A — 💰 money-critical. ✅ = closed in this PR; the rest are the next targets (all mock-based — see §3a for what’s deliberately left to Postgres).
test_settle_game_is_idempotent_no_double_pay (cog half; the SQL FOR UPDATE half is Postgres’s, §3a).test_bet_outcome_pure with explicit cases (the void branch is covered, the neutral-venue flip is implicit).test_extra_day_slate_coalesces_same_refresh_reads); NHL “hockey” tag — ✅ already covered (test_sportsdata_hub.py)._bettable_for_guild DB-error fail-SAFE to default (§1.2, §1.5) — test_bettable_for_guild_fails_safe_to_default_on_db_error. _bettable_sports_union per-guild-error branch still open._settle_game idempotency test.Left to Postgres, not unit-tested (owner steer): the locked-stake daily top-up, place_bookie_bet balance-never-negative under concurrency, and the SQL FOR UPDATE idempotency — DB-engine guarantees, trusted + verified by review + prod (/debug/query), not a harness. Mocks cover the surrounding logic.
Tier B — resilience / silent-degradation:
open_bet_sports read-error fail-open (§1.3.18); _deserialize_slate drift skip + country_codes.json corrupt degrade (§1.4, §1.10)._bets_blob bookie-exception (§2.3, §2.10)._emit_odds_health wiring + no_odds_rule presence assertion (§2.4).Tier C — coverage completeness (behavioral / structural):
watched_union narrowing actually change behavior (§2.7).facts; truncation/URL-strip on commentary (§2.5).Regression tests added for the highest-value bug-mapped gaps (all pure / mockable):
test_settle_game_is_idempotent_no_double_pay: the commentator-recap-vs-settle-loop race pays only once (the SQL-level FOR UPDATE half is Postgres’s own guarantee, trusted — see the note below)._bettable_for_guild DB-error → fail-SAFE to default (§1.2 / §1.5) — test_bettable_for_guild_fails_safe_to_default_on_db_error: a read blip narrows to {soccer, NBA}, never over-offers a disabled sport.test_doubleheader_shares_one_key_and_is_offered_once: offered once, not double-offered/double-paid.test_phase_halftime_code_beats_the_clutch_minute + test_phase_malformed_elapsed_falls_to_in_game.test_market_edge_suppressed_on_fast_triggers: the deeper-market beat rides only pregame/interval, never a goal.Already covered (verified during this pass, not gaps): the NHL score_to_snapshot “hockey” tag (test_sportsdata_hub.py), the Ghana UTC-boundary two-date finals scan (test_extra_day_slate_coalesces_same_refresh_reads), and the unreconcilable-name→VOID safety floor + Switzerland/BiH alias (test_bet_outcome_pure).
Deliberately NOT unit-tested (owner steer): the raw-SQL transactional guarantees — locked-stake daily top-up, place_bookie_bet balance-never-negative under concurrency, the SQL-level FOR UPDATE settle idempotency. These are Postgres’s guarantees, not ours; a live-DB harness would test the database engine, not our logic. Mocks cover the call-path/logic (which side settles, what gets announced, how a None/[] return is handled); the SQL correctness is trusted, verified by review + prod behavior (/debug/query), not a harness.
# T1 — deterministic (CI)
ruff check . && mypy . && pytest
pytest tests/test_bookie.py tests/test_commentator.py tests/test_sports_cadence.py \
tests/test_sportsdata_sources.py tests/test_us_teams.py -q # the surfaces here
# T4 — behavioral eval (commentary voice + grounding, paid)
ANTHROPIC_API_KEY=... python scripts/eval_commentate.py
T2 — dry-run recipes (before merging a settlement/feed change):
_settle_once with a mock bot: primary settles soccer (backstop untouched when healthy), Odds-API backstop settles a team-sport final when api_sports.degraded, and a name fold reconciles (see the merged registry dry-run as the template)./games, SGO /events, Odds-API /scores) via the Railway-key recipes in CLAUDE.md, and confirm the parser maps it.T3 — live prod verification (after deploy):
bet_settle_tick heartbeats with ok:true and games_checked>0 when bets are open (settle loop alive through the real code path).bets_settled event with the right winner + the payout embed; and the bookie_bets table shows the bet flipped won/lost/void (via /debug/query).commentary_posted per-trigger with the expected has_* depth flags; commentary_scored ship-rate healthy; no surface_dark/scorer_truncated findings./debug/integrations all feeds green; /debug/usage no quota near cap.The litmus test (per CLAUDE.md): if a surface started silently failing at 3am, would a graph or a flagged ops-monitor finding show it by the next run? Every 💰 path above must answer yes.