tootsies

a discord bot for the tootsies server. ask, recap, discuss, ship features by typing.

Project maintained by mejasonmejason Hosted on GitHub Pages — Theme by mattgraham

Handoff: Opus /ask length drift (roast pile-on) — investigation findings

Context for the per-model prompt rebuild epic (#849). Pairs with issue #858 and PR #856.

TL;DR

Opus /ask over-runs its length cap (1–2 sentences for a take) — but ONLY in a back-and-forth / thread continuation, NOT on cold takes.
Driver is a “defend-then-soften” reflex when challenged, plus a length-sanctioning roast rule that lives in the Sonnet-era constitution (Layer 1), which #851 never Opus-optimized.
The right fix is a Layer 1 (constitution.py) reword of the roast rule — owner-merge, separate from PR #856. The ask-core (Layer 3) has NO roasting rule, so patching length there is fixing the wrong layer.
PR #856 ships ONLY the eval harness to measure that fix (no prompt change).

How Opus’s ask prompt is assembled (verified from code)

Per-model registry (#849/#850/#851): claude_client.py MODEL_RULES[model] -> ModelRules, resolved by rules_for(model) at call time. Opus = _OPUS, a replace(_LEGACY, ...).
Opus runs skip_persona=True; its ENTIRE system prompt = lean_persona + "\n\n" + assembled_ask_system_extra (claude_client.py ~4902-4913, _call ~2996).
#851 commit explicitly: “The universal safety floor (CONSTITUTION, incl. DATA INTEGRITY) is unchanged and stays prepended.” => constitution + operational tail are NOT Opus-tuned.

The 4 layers Opus sees for an ask (~13.4k chars total, dumped from the real path)

LAYER 1 — CONSTITUTION (constitution.py): HARD_RULES (incl. 9-bullet DATA INTEGRITY) + HOUSE_RULES (10) + CALIBRATION. NOT Opus-optimized. Shared by all models.
LAYER 2 — lean persona: _OPUS_ID (claude_client.py:2774) + _OPUS_VOICE (:2780). #851.
LAYER 3 — _OPUS_ASK_CORE (:2865): TASK, _OPUS_TAGS, _OPUS_GROUNDING, _OPUS_FACT_TAKE, _OPUS_MEMORY, _OPUS_IMAGE, _OPUS_LENGTH, _OPUS_TOOLDISC, _OPUS_CITE, _OPUS_REGULARS. #851.
LAYER 4 — operational tail: COMMAND_GUIDE (always) + tool preamble/per-tool notes, market literacy, feed status, voice/image/gif notes (conditional). NOT Opus-optimized.

Production evidence (Railway, deploy 596a290a, post-#851)

answer_length events: sentences [2,2,3,3,3,4,5,5,5,6] -> 8/10 >=3 sentences, median 3.5, max 6; chars median 269, 8/10 over 200.
ask_answered pairs show the long ones are thread-continuation roasts (“you’re such a hater”, “wtf?”), not explainers/lists.

Dry-run baseline (scripts/dryrun_opus_ask.py, model=OPUS)

Cold single-shot cases: 27/27 green. is drake done -> 2 tight sentences; explainer unlocks; p_take/p_roast land at 2-3. => the harness as it was CANNOT see the drift, because every case is single-shot.

Reproduction + ablation (the core findings)

Added back-and-forth cases (her own prior jab + a user volley as context), strict <=2 bar:

bf_hater 0/3, bf_volley 1/3 on today’s prompt => drift reproduced and isolated.
Over-cut guards (softexplain/softlist/ranking) all pass => a fix can be checked for over-correction (clipping genuine longer answers).

Anatomy of an over-long reply: core take is 1 sentence (“-600 isn’t a comeback, it’s a cover charge”), wrapped in (1) a self-defense vs the accusation (“hate’s a strong word, i clapped for the jersey retirement”) + (2) a warm softener/offer (“the rafters miss you”, “find me a number and i’ll cheer”). That defend-then-soften wrapper is the amplifier.

Ablation (bf_* avg sentences, want <=2):

variant	bf_hater	bf_volley
baseline	3.0	3.0
cut `_OPUS_REGULARS` (don’t-punch-down)	3.3	2.3
add a thread-tightness clause	1.3	1.7

Cutting regulars did NOT help (went up) AND isn’t free (it’s the lane/villain guard).
An explicit “land one comeback, don’t defend, don’t tack on a warm wind-down” works.
NOTE: the “full takedown” L1 line was PRESENT in all variants, so it was never isolated.

Root cause: cross-layer tension, roast rule is Sonnet-era

LAYER 1 CALIBRATION (constitution.py:50): “Roast freely. The regulars are fair game, and a diss track or A FULL TAKEDOWN of someone who’s in on the bit is the fun…” -> sanctions VOLUME.
LAYER 3 _OPUS_LENGTH: a take is 1-2 sentences.
LAYER 3 has NO roasting rule — _OPUS_REGULARS is about not-villainizing + staying in lane, not roast length. So the length-relevant roast sanction lives ONLY in the un-optimized L1, contradicting _OPUS_LENGTH. Fixing in L3 = stacking a counter-rule on top of the real one.
Also surfaced L1<->L3 duplication: STAY IN YOUR LANE is in both CALIBRATION and _OPUS_REGULARS; DATA INTEGRITY (9 bullets) overlaps the lean _OPUS_GROUNDING.

Fix direction (track 2, owner-merge, NOT in PR #856)

Reword the LAYER 1 roast rule so a roast lands sharp in a line or two and drops the defend-then-soften reflex. constitution.py is a protected path (owner merges) and affects EVERY surface => deliberate, eval-gated.
Ablate the “full takedown” reword on bf_* first to confirm it’s actually the lever (untested).
Run constitution/jailbreak + memory-fence evals to prove the floor didn’t weaken.
Trim the L1<->L3 duplication while there.

Verbatim roast-adjacent rules + proposed rewords (untested as worded)

L1 CALIBRATION (constitution.py:50) — PROTECTED:

BEFORE: “Roast freely. The regulars are fair game, and a diss track or a full takedown of someone who’s in on the bit is the fun, not a violation, that’s the whole vibe of the room. The only hard lines: never punch at identity, appearance, or anything someone can’t change, and read the room so play doesn’t curdle into a real pile-on.”
AFTER (proposed): “Roast freely. The regulars are fair game and a sharp diss is the fun, not a violation, that’s the whole vibe of the room. Land it in a line or two, the cut is in the aim not the word count, and when someone fires back you hit the next clean line, you don’t defend yourself or smooth it over with a warm wind-down. The only hard lines: never punch at identity, appearance, or anything someone can’t change, and read the room so play doesn’t curdle into a real pile-on.”

L2 _OPUS_VOICE (claude_client.py:2780):

BEFORE: “Bartender voice: sharp, not mean. No em dashes ever (commas, periods, parens). No emoji unless they used one first.”
AFTER (proposed): “Bartender voice: sharp and short, not mean, one clean cut over three. No em dashes ever (commas, periods, parens). No emoji unless they used one first.”

L3 _OPUS_REGULARS (claude_client.py:2855):

BEFORE: “REGULARS: name someone in the room and it’s a playful jab from their bartender, never the villain. A verdict lands on the subject (the take, the song, the team), not the person. Don’t trash an absent regular to side with whoever you’re talking to.”
AFTER (proposed): adds “…it’s ONE playful jab…land it and stop. When they fire back (‘you’re such a hater’, ‘wtf’) you don’t defend whether you’re a hater or tack on a warm wind-down, you just hit the next clean line or let it ride. …”

L3 _OPUS_FACT_TAKE (claude_client.py:2812):

BEFORE: “…have it, commit, say what you think, don’t qualify it into mush (‘id say’, …).”
AFTER (proposed): “…say what you think in a line…and don’t prop it up with a second supporting jab.”

NOTE: _VOICE_REMINDER (claude_client.py:560, incl. a second REGULARS RULE at :589) is NOT in Opus’s prompt — the lean ask_core replaces the legacy wall. It applies to Sonnet/legacy + other surfaces only.

What’s shipped / tracked

Branch: claude/opus-ask-length-rules-t04u40
PR #856 (eval harness ONLY, no prompt change):
- scripts/dryrun_opus_ask.py: added bf_volley/bf_hater repro + softexplain/softlist/ranking over-cut guards; per-case ctx/asker threading.
- scripts/eval_opus_length.py: NEW registered, trend-only length-judge benchmark + GOLDEN (verbatim 5-sentence prod roast must fail). Verified all-clear 5/5 + golden caught.
- registered in scripts/run_evals.py; wiring tests in tests/test_eval_wiring.py.
Issue #858 (bug, p1) under epic #849 — same findings.
answer_length telemetry (#676) = the production signal to confirm a fix lands (p90 drop).

RESOLUTION (shipped) — and a correction to the prescription above

The “fix direction” above (reword the Layer-1 roast line) turned out to be WRONG, and the correction is the real lesson here. Measured across THREE ablation rounds on the real ask pipeline (n=5), every doctrine-clean lever washed:

lever (n=5, bare bf_hater)	result
cut “full takedown” (constitution)	3.4 -> 2.8, still >2
reword the roast line (constitution)	3.4 -> 3.0, still >2
cut “not mean” (voice)	3.6 -> 3.0, still >2
reword/fold tightness into voice	3.6 -> 2.6, still >2
explicit thread-tightness clause in `_OPUS_LENGTH`	-> 1.3-1.7, <=2 ✓

So: cutting/rewording the roast line is a near-wash on the back-and-forth length, AND the voice layer washes too. The obvious-cause (the roast rule) was not the lever. The only thing that moved the metric was an EXPLICIT length clause — i.e. an add.

This is not a doctrine violation, it’s the doctrine’s own method working. The rule is “don’t instinctively add” — ablate clean cuts/rewords first. We did, exhaustively; they failed; so a measured add is the justified call. The shipped fix (_OPUS_LENGTH: “in a back-and-forth, land ONE comeback and stop, no defending whether you’re a hater, no warm wind-down”) is that add, validated: bf_* flip to <=2, over-cut guards (explainer/list/ranking) stay unlocked, roast stays sharp, lane holds.

Shipped:

The “full takedown” cut + the whole constitution Opus pass: #866 (OPUS_CONSTITUTION), on prose-hygiene grounds, NOT as the length fix.
The length fix (the _OPUS_LENGTH thread clause) + the STAY-IN-YOUR-LANE L1<->L3 dedup: the #858 close-out PR.

For docs/PROMPT_OPTIMIZATION.md: its #858 worked example currently prescribes “reword the Layer-1 roast line” as the fix — that’s the disproven prescription. It should be updated to: clean cuts/rewords (constitution AND voice) all washed across 3 rounds; the evidence-justified fix was a measured _OPUS_LENGTH add after the clean levers were proven to fail. The deeper lesson stands and is sharpened: ablate cut/reward/add separately, and “don’t add” means “don’t add first,” not “never add.”