tootsies

a discord bot for the tootsies server. ask, recap, discuss, ship features by typing.


Project maintained by mejasonmejason Hosted on GitHub Pages — Theme by mattgraham

Handoff: Opus /ask length drift (roast pile-on) — investigation findings

Context for the per-model prompt rebuild epic (#849). Pairs with issue #858 and PR #856.

TL;DR

How Opus’s ask prompt is assembled (verified from code)

The 4 layers Opus sees for an ask (~13.4k chars total, dumped from the real path)

Production evidence (Railway, deploy 596a290a, post-#851)

Dry-run baseline (scripts/dryrun_opus_ask.py, model=OPUS)

Reproduction + ablation (the core findings)

Added back-and-forth cases (her own prior jab + a user volley as context), strict <=2 bar:

Anatomy of an over-long reply: core take is 1 sentence (“-600 isn’t a comeback, it’s a cover charge”), wrapped in (1) a self-defense vs the accusation (“hate’s a strong word, i clapped for the jersey retirement”) + (2) a warm softener/offer (“the rafters miss you”, “find me a number and i’ll cheer”). That defend-then-soften wrapper is the amplifier.

Ablation (bf_* avg sentences, want <=2):

variant bf_hater bf_volley
baseline 3.0 3.0
cut _OPUS_REGULARS (don’t-punch-down) 3.3 2.3
add a thread-tightness clause 1.3 1.7

Root cause: cross-layer tension, roast rule is Sonnet-era

Fix direction (track 2, owner-merge, NOT in PR #856)

  1. Reword the LAYER 1 roast rule so a roast lands sharp in a line or two and drops the defend-then-soften reflex. constitution.py is a protected path (owner merges) and affects EVERY surface => deliberate, eval-gated.
  2. Ablate the “full takedown” reword on bf_* first to confirm it’s actually the lever (untested).
  3. Run constitution/jailbreak + memory-fence evals to prove the floor didn’t weaken.
  4. Trim the L1<->L3 duplication while there.

Verbatim roast-adjacent rules + proposed rewords (untested as worded)

L1 CALIBRATION (constitution.py:50) — PROTECTED:

L2 _OPUS_VOICE (claude_client.py:2780):

L3 _OPUS_REGULARS (claude_client.py:2855):

L3 _OPUS_FACT_TAKE (claude_client.py:2812):

NOTE: _VOICE_REMINDER (claude_client.py:560, incl. a second REGULARS RULE at :589) is NOT in Opus’s prompt — the lean ask_core replaces the legacy wall. It applies to Sonnet/legacy + other surfaces only.

What’s shipped / tracked

RESOLUTION (shipped) — and a correction to the prescription above

The “fix direction” above (reword the Layer-1 roast line) turned out to be WRONG, and the correction is the real lesson here. Measured across THREE ablation rounds on the real ask pipeline (n=5), every doctrine-clean lever washed:

lever (n=5, bare bf_hater) result
cut “full takedown” (constitution) 3.4 -> 2.8, still >2
reword the roast line (constitution) 3.4 -> 3.0, still >2
cut “not mean” (voice) 3.6 -> 3.0, still >2
reword/fold tightness into voice 3.6 -> 2.6, still >2
explicit thread-tightness clause in _OPUS_LENGTH -> 1.3-1.7, <=2

So: cutting/rewording the roast line is a near-wash on the back-and-forth length, AND the voice layer washes too. The obvious-cause (the roast rule) was not the lever. The only thing that moved the metric was an EXPLICIT length clause — i.e. an add.

This is not a doctrine violation, it’s the doctrine’s own method working. The rule is “don’t instinctively add” — ablate clean cuts/rewords first. We did, exhaustively; they failed; so a measured add is the justified call. The shipped fix (_OPUS_LENGTH: “in a back-and-forth, land ONE comeback and stop, no defending whether you’re a hater, no warm wind-down”) is that add, validated: bf_* flip to <=2, over-cut guards (explainer/list/ranking) stay unlocked, roast stays sharp, lane holds.

Shipped:

For docs/PROMPT_OPTIMIZATION.md: its #858 worked example currently prescribes “reword the Layer-1 roast line” as the fix — that’s the disproven prescription. It should be updated to: clean cuts/rewords (constitution AND voice) all washed across 3 rounds; the evidence-justified fix was a measured _OPUS_LENGTH add after the clean levers were proven to fail. The deeper lesson stands and is sharpened: ablate cut/reward/add separately, and “don’t add” means “don’t add first,” not “never add.”