Karpathy Guidelines For Codex
Check out my new project Multica — an open-source platform for running and managing coding agents with reusable skills.
Follow me on X: https://x.com/jiayuan_jy
karpathy-guidelines is a Codex-first skill for high-frequency work conversations that need strong execution with tight scope control.
It is derived from Andrej Karpathy's observations about common LLM coding failures:
- silently assuming requirements
- overengineering simple changes
- making unrelated edits
- claiming success without verification
This repository packages those ideas as a reusable skill instead of a loose manifesto.
It is designed to feel closer to a background operating layer than a one-off prompt trick.
What The Skill Does
The skill keeps four operating rules:
Think Before Coding
Simplicity First
Surgical Changes
Goal-Driven Execution
In practice, that means Codex should:
- move forward autonomously when the task is clear
- stop and narrow the problem when the request is materially ambiguous
- choose the smallest change that solves the requested problem
- avoid opportunistic refactors and speculative abstractions
- verify results before reporting completion
High-Frequency Activation
This skill is intended to activate in most non-casual work conversations, not only when you are already editing code.
It should behave in two layers:
Baseline guardrails: active in general work, analysis, planning, and technical discussion
Escalation mode: becomes stricter when the task turns into implementation, debugging, review, or completion reporting
That makes it closer to the way superpowers stays present across a session, while still keeping its own scope narrow.
When To Use It
Use this skill for:
- most non-casual work conversations
- feature work
- bug fixes
- analysis and planning
- refactors
- code review
- implementation tasks where ambiguity or scope creep is likely
Do not use it for:
- casual chat
- pure translation
- pure copy editing
- non-code creative writing
Why This Works Well With Codex
Codex is strong at independently driving multi-step work once the target is clear. The failure mode is not lack of capability; it is drift.
This skill is designed to preserve Codex's strengths while constraining the common failure modes:
- it allows autonomous execution on straightforward tasks
- it provides light guardrails even before code changes start
- it forces explicit narrowing only when ambiguity or risk is real
- it keeps the model inside the requested scope
- it makes verification part of the definition of done
Install For Codex
Install the skill into your Codex skills directory.
If CODEX_HOME is set:
mkdir -p "$CODEX_HOME/skills/karpathy-guidelines"
cp -R skills/karpathy-guidelines/. "$CODEX_HOME/skills/karpathy-guidelines"
If CODEX_HOME is not set:
mkdir -p ~/.codex/skills/karpathy-guidelines
cp -R skills/karpathy-guidelines/. ~/.codex/skills/karpathy-guidelines
After that, Codex can discover the skill by name or trigger it when the request matches the description.
Recommended Codex Usage
Because the trigger is intentionally broad, it may activate automatically in many work conversations. Explicitly naming it is still the strongest signal.
Use the skill name directly when you want the behavior to be explicit:
Use karpathy-guidelines for this task.
Fix the failing auth tests without refactoring unrelated modules.
If the failure could be interpreted in multiple ways, list the interpretations before editing.
Before saying it's fixed, run the strongest relevant verification command.
For broad implementation tasks, combine the skill with a scope constraint:
Use karpathy-guidelines.
Implement the requested feature, but stay within the existing architecture.
No speculative abstractions, no drive-by cleanup, and stop to clarify if the API contract is ambiguous.
For code review:
Use karpathy-guidelines and review this diff.
Prioritize correctness risks, scope creep, overengineering, and missing verification.
For ordinary work conversations where you want the light version:
Use karpathy-guidelines.
Help me think through this implementation approach, but keep assumptions explicit and do not over-design it.
What It Constrains
With this skill active, Codex should:
- make assumptions explicit instead of silently committing to one
- prefer one concise clarifying question over a large speculative implementation
- keep diffs small and task-shaped
- treat tests, builds, or repro steps as completion evidence
It should not: