Root-Cause-First Engineering
Purpose
Do not start with protective backend logic when the failure may originate from missing, conflicting, or ignored model instructions. First prove what the model was asked to do, what it returned, and why.
Required workflow
- Capture the exact failure.
- Capture raw model request/response evidence:
- selected agent
- system prompt or cached prompt provenance
- current user prompt
- raw LLM response
- state before and after the turn
- whether the failing selected agent actually loaded the prompt file you plan to edit
- Classify the root cause:
- missing instruction
- contradictory instruction
- instruction present but too weak or ambiguous
- instruction followed but backend transformed the result incorrectly
- backend/state invariant violation independent of model judgment
- Fix upstream first:
- prompt file
- agent instruction builder
- agent routing/precedence when the wrong agent handled the modal/state
- JSON schema
- planning protocol
- tool/result feedback wording
- Test the upstream fix on the real path that failed.
- For level-up work, do not add backend enforcement unless the human explicitly approves enforcement in-thread.
- Add backend protection only when one of these is true:
- user safety or data integrity requires fail-closed enforcement
- external/model nondeterminism can still violate a hard invariant
- backend already owns a deterministic execution rule
Backend protection rules
If backend enforcement is still needed:
- Keep it narrow.
- Log when it fires.
- Do not make it the primary semantic decision-maker.
- Document why prompt/schema correction was insufficient.
- Prefer correction-only invariants over broad fallback behavior.
Proof taxonomy for backend logic
When reviewing or proposing non-prompt logic, classify every backend behavior
with one of these proof states:
- Server-owned invariant: the backend owns the operation by design, such as
routing before the model call, persistence after the model call, locks,
request/session IDs, tool execution metadata, or schema normalization.
- Prompt/schema-insufficient, proven: raw request/response evidence shows
the intended agent received the intended prompt/schema and still produced the
bad payload on the real path after an upstream fix attempt.
- Backend-transformation bug: the model produced an acceptable payload, but
backend parsing, fallback, persistence, or UI projection changed or discarded
it incorrectly.
- Unproven fallback: tests show the fallback works, but there is no raw
evidence that the model/prompt/schema path cannot handle the behavior.
- ZFC violation candidate: backend logic performs semantic judgment,
classification, routing, or choice generation that belongs to the model.
Do not call backend logic "needed" unless it is either a server-owned invariant
or has prompt/schema-insufficient proof from the real path. Synthetic tests are
supporting evidence only; they do not prove the model cannot handle a semantic
decision.
Anti-patterns
- Adding a sanitizer before checking whether the prompt omitted the rule.
- Editing
LevelUpAgent prompts when logs prove RewardsAgent or another agent handled the failing turn.
- Adding a fallback that masks an LLM schema failure.
- Adding keyword or regex intent routing instead of model instruction/schema repair.
- Treating a guard as the fix when the LLM is still being asked the wrong thing.
Output checklist
When reporting the fix, include:
- Root cause category.
- Prompt/schema/agent change attempted first.
- Evidence that the real path now receives the intended instruction.
- Evidence that the selected agent/routing path is the intended one.
- Whether backend protection was added, and why.
- Test/evidence path or exact command.
- A component table for each backend guard/fallback/scrubber with columns:
component, non-prompt behavior, proof state, evidence, verdict.