From m2ai-skills-pack
Score an agent system design against the Bitter Lesson principle — how much "how" is encoded vs "what", how much bets on model improvement vs locks in current limitations. Flags procedural lock-in, hardcoded orchestration, and domain hacks. Produces a simplification roadmap. Use when designing new agent systems, reviewing agent architecture, or deciding what to simplify. Trigger on "bitter lesson", "score architecture", "agent complexity audit", "simplification roadmap", "how vs what ratio", "are we fighting the model".
How this skill is triggered — by the user, by Claude, or both
Slash command
/m2ai-skills-pack:bitter-lesson-scorecardThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Score an agent system against the Bitter Lesson: computation and learning beat hand-engineering. Systems that encode "how" instead of "what" get worse as models improve.
Score an agent system against the Bitter Lesson: computation and learning beat hand-engineering. Systems that encode "how" instead of "what" get worse as models improve.
Ask the user to provide ONE of:
If given a codebase path, look for: orchestration files, state machines, routing logic, prompt templates, tool definitions, multi-agent coordination code.
Map every architectural component into one of these categories:
| Category | Bitter Lesson Alignment | Examples |
|---|---|---|
| Outcome Spec | ALIGNED — says "what" | Goal definitions, success criteria, quality thresholds |
| Tool Interface | ALIGNED — extends capability | API wrappers, file access, search tools |
| Hard Constraint | NEUTRAL — business necessity | Auth, rate limits, compliance rules, safety gates |
| Procedural Orchestration | MISALIGNED — encodes "how" | State machines, fixed step sequences, hardcoded agent routing |
| Model Compensation | MISALIGNED — bets against improvement | Chunking strategies, re-ranking, format enforcement, retry heuristics |
| Domain Hack | MISALIGNED — freezes current knowledge | Hardcoded few-shot examples, domain-specific parsing, manual entity extraction |
Calculate three scores (0-100):
(outcome_specs + tool_interfaces + hard_constraints) / total_components * 100
Higher = more aligned with the Bitter Lesson.
(procedural_orchestration + model_compensation + domain_hacks) / total_components * 100
Higher = more locked into current model limitations. This is the number to REDUCE.
Estimate: if the underlying model improves 2x in capability, what percentage of the system becomes unnecessary?
High leverage = the system will naturally simplify with better models. Low leverage = the system fights improvement.
## Bitter Lesson Scorecard: [System Name]
| Metric | Score | Rating |
|--------|-------|--------|
| Alignment | XX/100 | [STRONG/MODERATE/WEAK] |
| Lock-In | XX/100 | [LOW/MODERATE/HIGH] |
| Improvement Leverage | XX% | [HIGH/MODERATE/LOW] |
## Component Breakdown
| Category | Count | % of System |
|----------|-------|-------------|
| Outcome Specs | N | X% |
| Tool Interfaces | N | X% |
| Hard Constraints | N | X% |
| Procedural Orchestration | N | X% |
| Model Compensation | N | X% |
| Domain Hacks | N | X% |
## Top Bitter Lesson Violations
1. [Component] — [Why it bets against improvement]
2. ...
3. ...
## Simplification Roadmap
### Quick wins (delete now, test)
- ...
### Medium-term (replace orchestration with outcome specs)
- ...
### Strategic (requires model capability validation)
- ...
For each MISALIGNED component, suggest:
Nate's Newsletter (2026-04-01): The Bitter Lesson applied as a practical audit tool for agent architectures — scoring systems on "how" vs "what" encoding and producing simplification roadmaps.
npx claudepluginhub m2ai-mcp-servers/claude-skills --plugin m2ai-skills-packGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.