tootsies

a discord bot for the tootsies server. ask, recap, discuss, ship features by typing.


Project maintained by mejasonmejason Hosted on GitHub Pages — Theme by mattgraham

Handoff: Perplexity Sonar API — search-control parameters not being used

RESOLVED. Per-surface search controls are now set in utils/perplexity.py (_SEARCH_CONFIG): every surface gets search_context_size="medium" and the trend/news surfaces get a recency window (music/discourse week, recap/chimein day; ask stays unfiltered so evergreen fact-verification pages survive). search() takes search_context_size / recency overrides for the eval harness at scripts/eval_perplexity_params.py. high context was deliberately NOT shipped by default (cost is cross-surface) — flip it via the eval if a surface still hedges. The rest of this doc is the original investigation.

TL;DR

utils/perplexity.py calls Perplexity Sonar with only model + messages, so it inherits every default — including search_context_size="low" (shallowest retrieval). Result: low-signal, evergreen filler. Symptom (found while evaluating music posts): every Perplexity response opens with “I can’t verify live trends — results are mostly YouTube mixes / playlist pages.” We were encoding intent as prose in the query string (“last few hours”, “check Twitter/X”) and assuming the API acts on it; those are actually structured API parameters we never set. Fix = set the search-control params (per-surface), eval before/after, ship as its own PR.

Repo context

Tootsies = Discord bot (“Toots”). utils/perplexity.py wraps Perplexity Sonar to inject real-time web context into 5 surfaces (ask / discourse / recap / chimein / music). Python 3.11, asyncpg, aiohttp. CI gates on ruff check . && mypy . && pytest (all three must pass). Never push to main — branch + PR.

Root cause (confirmed: docs + live A/B)

Payload at utils/perplexity.py:74-79:

payload = {"model": _MODEL, "messages": [{"role": "user", "content": query}]}

No search params → all defaults inherited. Biggest culprit is search_context_size="low".

Parameters available (not used)

Docs: https://docs.perplexity.ai/api-reference/chat-completions-post and https://docs.perplexity.ai/docs/sonar/filters

Parameter Current (default) Consider Notes
web_search_options.search_context_size "low" "medium"/"high" Likely dominant fix. Low = shallow retrieval → “can’t verify” filler. Cost: sonar ~$5/1k req (low) → ~$12/1k (high).
search_recency_filter none "week" (music) / "day" (news/chimein) Values: hour/day/week/month/year. Cannot combine with date filters.
search_domain_filter none allowlist Billboard/Pitchfork; or denylist -youtube.com Max 20 domains. Allowlist (no prefix) OR denylist (- prefix), not both.
model sonar maybe sonar-pro Higher quality + cost.
search_mode web per-surface web/academic/sec. Date filters silently ignored in academic.

Key design decision: make it per-surface

Note: the query text in build_search_query already says “last few hours” for some surfaces — too tight for music, and prose ≠ parameter regardless.

Live A/B already run (prod key, R&B discourse query)

Hedging was intermittent across runs → recency alone isn’t a guaranteed fix; docs point to search_context_size="low" as the bigger lever. Test both.

Files & exact references

Suggested approach

  1. Add search params to the payload in search() — per-surface config (recency + context_size, maybe domains) threaded via a new arg or a purpose→config map.
  2. Cost is cross-surface (5 callers)search_context_size bump ~doubles per-call cost. Get user sign-off before going high.
  3. Eval before/after: measure “hedging rate” (responses containing “can’t verify”/”cannot”/”do not have”) across surfaces × {low,medium,high} × {recency on/off}. Copy the pattern in scripts/eval_music_post.py.
  4. Its own PR (NOT folded into music PR #155).
  5. Gotcha: search_recency_filter can’t combine with search_after/before_date_filter.

Testing live (key is on Railway, not in session env)

Pull PERPLEXITY_API_KEY via Railway GraphQL (pattern in CLAUDE.md “Debugging Railway deploys”):

Checks before commit

ruff check . && mypy . && pytest. No em dashes in string literals (enforced: tests/test_persona.py::test_no_em_dashes_anywhere_in_repo). Branch off main, open PR.

OUT OF SCOPE (separate follow-up, don’t conflate)

iTunes reference verification: utils/apple_music.py appends whatever iTunes returns for a fuzzy query with zero check that resolved artist/title matches what the model named (wrong-track false-positive risk on the links-only music channel). Separate PR on the music side.