a discord bot for the tootsies server. ask, recap, discuss, ship features by typing.
Question (owner): “how do we feel about using opus for discourse i’m not even sure”
Answer: Opus on a LEAN discourse prompt is the strongest of the three. Shipped a
per-model lean discourse rebuild (the ask ask_core pattern, applied to discourse).
This doc corrects an earlier conclusion in its own history. The short version: the first measurement compared Opus on the Sonnet-tuned prompt against Sonnet, found Opus worse, and (wrongly) concluded “keep Sonnet, don’t build.” That was the exact confound the epic exists to fix — Opus on un-rebuilt text drifts. On a lean prompt the result flips hard.
Legacy head-to-head looked like Sonnet won. scripts/dryrun_opus_discourse.py
(8 grounded topics, same prompt to both): Sonnet edged the same-topic head-to-head
~6-2 at equal quality. Conclusion at the time: keep Sonnet. But both models ran
the legacy 43k Sonnet-tuned prompt — so this measured “Opus on the wrong prompt,”
not “Opus is worse at discourse.”
The head-to-head judge is noisy. Re-run on the same conditions it swung 6-2 -> 4-4 -> 2-6. It was over-weighted in both directions; the stable per-post metrics (below) are the trustworthy signal.
Give Opus a lean prompt and it flips. A lean discourse core (the open-loop
reframe + reused/rebuilt blocks, run via skip_persona/lean_persona), measured
BEFORE (prod Opus-legacy) vs AFTER (lean), 16 topics:
| metric | BEFORE (Opus-legacy) | AFTER (Opus-lean) |
|---|---|---|
quality (harsh must_post judge) |
1/16 | 6/16 |
| open-question (opens a question vs closes a verdict) | 5/16 | 14/16 |
| flag-planted (her OWN take vs a naked poll) | 10/16 | 15/16 |
| mean sentences | 2.3 | 2.4 |
Sonnet-legacy sat at quality ~2/8, open ~3-4/8 across runs — so lean Opus clears both legacy conditions on every stable axis.
The keystone is the TASK reframe: discourse is an OPENING, not an answer — plant your flag first, then hand the room a question it argues about. That one change drove open-question 1/8 -> 8/8 and (with “never a naked poll”) flag-planted 10/16 -> 15/16.
The walk-through, block by block:
_POST_GROUNDING+4 lessons, _ROOM_DIRECTED, _VOICE_REMINDER,
_LENGTH_RULES, _TOOL_DISCIPLINE) -> cut for Opus. Voice/length live in the lean
persona; the room-specific ones (_ROOM_DIRECTED + the lessons) were rebuilt as lean
Opus versions AND ablated — they didn’t help (substance flat) and made drift
worse (naming a pattern to forbid primes Opus to produce it), so cut on evidence._OPUS_REGULARS (surface-neutral). The link rule is
discourse-framed inline, NOT the reused _OPUS_CITE/_OPUS_TOOLDISC: those are
ask-1:1-framed (“the user”, “your answer”), and the wired dry run + an 8-topic A/B
showed that framing leaks meta-commentary into a room post (META-leak 2/8 with them
vs 0/8 with the discourse-framed line, flag 7/8 -> 8/8).Result: 43,403 -> ~5,400 chars (~87% smaller).
One more real-path fix the dry run caught: the discourse forced-search retry
(a second _call with thinking OFF when the first surfaces no URLs) degrades the
lean Opus output (an n=10 diagnostic: forced posts ran ~4.0 sentences with
selection-narration meta-leak vs the thinking-ON primary’s ~2.3, tight + open). So
the forced retry is skipped on the lean Opus path (the lean core already says
“never invent a URL / linkless beats skipping”, and enforce_source_links strips
any hallucinated URL); legacy/Sonnet keeps it. Post-fix wired dry run: open 10/10,
flag 10/10, mean 2.4, forced 0/10.
ModelRules.discourse_core (None for Sonnet/Haiku -> legacy path byte-identical; set
for Opus -> lean core via skip_persona/lean_persona). Same rules_for seam as
ask_core. test_prompt_lessons still passes (the legacy path keeps composing the
shared blocks). Tests: test_discourse_system_extra_opus_uses_lean_rebuild +
test_discourse_system_extra_legacy_unchanged. Eval: scripts/eval_opus_discourse.py
(registered, trend-only) — behavioral, mirroring eval_recap (#881): it exercises the
REAL discourse() surface on BOTH Sonnet and Opus, with link-rich scenarios (feeds +
enriched links + Perplexity + recent chatter + live web_search) AND bare ones, plus a
deterministic no-repaste check and two judge-teeth goldens (a closed-loop verdict + a
naked poll, both must fail). On the real surface the open-loop gap is stark: Opus 3/4
vs Sonnet 0/4 (Sonnet runs the legacy prompt, which doesn’t enforce open-loop).
discourse still DEFAULTS to Sonnet; this lights up only for a guild that opts into
Opus discourse on the /menu Models page. The lean rebuild makes that opt-in
genuinely strong. Flipping the default to Opus is a separate decision left to the
owner — the per-post metrics support it, but it’s a default change, not a code one.