Karpathy Guidelines For Codex

Check out my new project Multica — an open-source platform for running and managing coding agents with reusable skills.

Follow me on X: https://x.com/jiayuan_jy

karpathy-guidelines is a Codex-first skill for high-frequency work conversations that need strong execution with tight scope control.

It is derived from Andrej Karpathy's observations about common LLM coding failures:

silently assuming requirements
overengineering simple changes
making unrelated edits
claiming success without verification

This repository packages those ideas as a reusable skill instead of a loose manifesto. It is designed to feel closer to a background operating layer than a one-off prompt trick.

What The Skill Does

The skill keeps four operating rules:

Think Before Coding
Simplicity First
Surgical Changes
Goal-Driven Execution

In practice, that means Codex should:

move forward autonomously when the task is clear
stop and narrow the problem when the request is materially ambiguous
choose the smallest change that solves the requested problem
avoid opportunistic refactors and speculative abstractions
verify results before reporting completion

High-Frequency Activation

This skill is intended to activate in most non-casual work conversations, not only when you are already editing code.

It should behave in two layers:

Baseline guardrails: active in general work, analysis, planning, and technical discussion
Escalation mode: becomes stricter when the task turns into implementation, debugging, review, or completion reporting

That makes it closer to the way superpowers stays present across a session, while still keeping its own scope narrow.

When To Use It

Use this skill for:

most non-casual work conversations
feature work
bug fixes
analysis and planning
refactors
code review
implementation tasks where ambiguity or scope creep is likely

Do not use it for:

casual chat
pure translation
pure copy editing
non-code creative writing

Why This Works Well With Codex

Codex is strong at independently driving multi-step work once the target is clear. The failure mode is not lack of capability; it is drift.

This skill is designed to preserve Codex's strengths while constraining the common failure modes:

it allows autonomous execution on straightforward tasks
it provides light guardrails even before code changes start
it forces explicit narrowing only when ambiguity or risk is real
it keeps the model inside the requested scope
it makes verification part of the definition of done

Install For Codex

Install the skill into your Codex skills directory.

If CODEX_HOME is set:

mkdir -p "$CODEX_HOME/skills/karpathy-guidelines"
cp -R skills/karpathy-guidelines/. "$CODEX_HOME/skills/karpathy-guidelines"

If CODEX_HOME is not set:

mkdir -p ~/.codex/skills/karpathy-guidelines
cp -R skills/karpathy-guidelines/. ~/.codex/skills/karpathy-guidelines

After that, Codex can discover the skill by name or trigger it when the request matches the description.

Recommended Codex Usage

Because the trigger is intentionally broad, it may activate automatically in many work conversations. Explicitly naming it is still the strongest signal.

Use the skill name directly when you want the behavior to be explicit:

Use karpathy-guidelines for this task.
Fix the failing auth tests without refactoring unrelated modules.
If the failure could be interpreted in multiple ways, list the interpretations before editing.
Before saying it's fixed, run the strongest relevant verification command.

For broad implementation tasks, combine the skill with a scope constraint:

Use karpathy-guidelines.
Implement the requested feature, but stay within the existing architecture.
No speculative abstractions, no drive-by cleanup, and stop to clarify if the API contract is ambiguous.

For code review:

Use karpathy-guidelines and review this diff.
Prioritize correctness risks, scope creep, overengineering, and missing verification.

For ordinary work conversations where you want the light version:

Use karpathy-guidelines.
Help me think through this implementation approach, but keep assumptions explicit and do not over-design it.

What It Constrains

With this skill active, Codex should:

make assumptions explicit instead of silently committing to one
prefer one concise clarifying question over a large speculative implementation
keep diffs small and task-shaped
treat tests, builds, or repro steps as completion evidence

It should not:

andrej-karpathy-skills

Popularity

What's Inside

README