Stats

Actions

Available In

Tags

Yalla

Give Claude Code a one-line task. Get back a tested, reviewed pull request.

Yalla is an autonomous coding pipeline for Claude Code. It turns a description into a planned, built, tested, reviewed, and shipped PR — using specialized agents and adaptive ceremony, grounded in a project knowledge base (your gotchas, risk checks, and architecture) and held to a Proof Contract that a built-in eval harness grades. A run only ships when evidence artifacts say it's proven. GitHub Issues are the task store; no database or external service required.

/yalla add rate limiting to the public API

┌──────────────────── KNOWLEDGE BASE ────────────────────┐ │ gotchas · minimum diff · risk gates · review checks · │ grounds │ architecture · test seams · task classification │ every phase └────────────────────────────┬───────────────────────────┘ ▼ minimum-diff ─▶ classify ─▶ track ─▶ plan ─▶ work ─▶ test ─▶ review ─▶ compound ─▶ ship │ │ │ │ │ │ │ pick adversarial build write & binary learnings PR — only ceremony + diagnosis vertical run pass/ fed back if verdict + gates plan slices tests fail into the is PROVEN gates knowledge ▲ ┌─────────────────────────────┴──────────────────────────┐ │ PROOF CONTRACT → evidence artifacts (.pipeline/*) │ grades │ verdict: PROVEN / NOT_PROVEN / INCONCLUSIVE │ every run │ graded by the eval harness (npm run eval:yalla:smoke) │ └─────────────────────────────────────────────────────────┘

Two things wrap the linear pipeline and make it more than a prompt: the knowledge base feeds project-specific constraints into every phase, and compound routes each run's learnings back into it — so the pipeline gets sharper the more you run it. The Proof Contract + eval harness sit underneath, turning "looks done" into a graded, artifact-backed verdict.

What it does

Phase

What happens

0 · Minimum Diff + Classify

Runs the minimum-diff ladder, then picks a task_type, risk tier, ceremony mode, evidence mode, and gates. A no-build/docs/config answer, one-line fix, and payment-flow change get different ceremony.

1 · Track

Creates (or resumes) a GitHub issue and a worktree branch.

2 · Plan

Researches the codebase, designs an approach, and adversarially challenges it. Bugs run a diagnosis gate first. You approve before any code is written.

3 · Work

Builds in vertical slices — each one a thin, demoable end-to-end behavior — writing a failing behavior test at the highest correct seam before the implementation that passes it.

4 · Test

Runs the suite until green, verifies every acceptance criterion maps to evidence, and records falsifiable verification (VERIFIED / NOT VERIFIED / INCONCLUSIVE).

5 · Review

Independent reviewers each answer one binary question (security? complexity? correctness?). Any Fail blocks the ship. The author never reviews their own code.

6 · Compound

Captures actionable learnings to their smallest lasting home so the same mistake isn't repeated.

7 · Ship

Writes .pipeline/outcome-evaluation.json, commits specific files, and opens a PR — PR-only by default; never auto-merges unless you asked in this run.

The core idea: keep the universal pipeline small, and activate risk-specific gates only when the diff touches that subsystem. A docs typo doesn't get dragged through payment, migration, and auth review. A change to your billing code does.

The Proof Contract

A run is "done" only when its verdict is PROVEN — and PROVEN is backed by artifacts, not prose. Before shipping, Yalla writes .pipeline/outcome-evaluation.json with a verdict of exactly one of:

PROVEN — every acceptance criterion is covered by valid evidence (a passing test, a static check, a browser/API probe, a smoke run), all required review checks pass, and no remaining delta exists. Only PROVEN may be called done, complete, or ready to merge.

NOT_PROVEN — evidence or review shows the promise isn't satisfied. An honest outcome, not a failure to hide.

INCONCLUSIVE — local proof is blocked or external evidence is unavailable. Can still open a PR, but the PR clearly says human review or external evidence is needed.

Missing evidence never becomes PROVEN. Deterministic proof is preferred — Yalla won't lean on a model judge when a concrete test or check can verify the behavior. This is what stops "looks done" from masquerading as "is done."

Adaptive classification

Yalla

Give Claude Code a one-line task. Get back a tested, reviewed pull request.

/yalla add rate limiting to the public API

        ┌──────────────────── KNOWLEDGE BASE ────────────────────┐
        │  gotchas · minimum diff · risk gates · review checks ·  │  grounds
        │  architecture · test seams · task classification         │  every phase
        └────────────────────────────┬───────────────────────────┘
                                      ▼
 minimum-diff ─▶ classify ─▶ track ─▶ plan ─▶ work ─▶ test ─▶ review ─▶ compound ─▶ ship
    │                  │       │       │        │           │          │
  pick     adversarial build  write & binary  learnings   PR — only
  ceremony + diagnosis vertical run    pass/   fed back    if verdict
  + gates  plan        slices  tests   fail    into the    is PROVEN
                                       gates   knowledge
                                      ▲
        ┌─────────────────────────────┴──────────────────────────┐
        │  PROOF CONTRACT  →  evidence artifacts (.pipeline/*)     │  grades
        │  verdict: PROVEN / NOT_PROVEN / INCONCLUSIVE             │  every run
        │  graded by the eval harness  (npm run eval:yalla:smoke)  │
        └─────────────────────────────────────────────────────────┘

What it does

Phase	What happens
0 · Minimum Diff + Classify	Runs the minimum-diff ladder, then picks a `task_type`, risk tier, ceremony mode, evidence mode, and gates. A no-build/docs/config answer, one-line fix, and payment-flow change get different ceremony.
1 · Track	Creates (or resumes) a GitHub issue and a worktree branch.
2 · Plan	Researches the codebase, designs an approach, and adversarially challenges it. Bugs run a diagnosis gate first. You approve before any code is written.
3 · Work	Builds in vertical slices — each one a thin, demoable end-to-end behavior — writing a failing behavior test at the highest correct seam before the implementation that passes it.
4 · Test	Runs the suite until green, verifies every acceptance criterion maps to evidence, and records falsifiable verification (`VERIFIED` / `NOT VERIFIED` / `INCONCLUSIVE`).
5 · Review	Independent reviewers each answer one binary question (security? complexity? correctness?). Any Fail blocks the ship. The author never reviews their own code.
6 · Compound	Captures actionable learnings to their smallest lasting home so the same mistake isn't repeated.
7 · Ship	Writes `.pipeline/outcome-evaluation.json`, commits specific files, and opens a PR — PR-only by default; never auto-merges unless you asked in this run.

The Proof Contract

PROVEN — every acceptance criterion is covered by valid evidence (a passing test, a static check, a browser/API probe, a smoke run), all required review checks pass, and no remaining delta exists. Only PROVEN may be called done, complete, or ready to merge.
NOT_PROVEN — evidence or review shows the promise isn't satisfied. An honest outcome, not a failure to hide.
INCONCLUSIVE — local proof is blocked or external evidence is unavailable. Can still open a PR, but the PR clearly says human review or external evidence is needed.

yalla

Popularity

What's Inside

Confidence

README

Yalla

What it does

The Proof Contract

Adaptive classification

Similar Plugins

fullstack-dev-skills

anthropic-essentials

godot-skills

feature-dev

prompts.chat

claude-code-toolkit

Yalla

What it does

The Proof Contract

Adaptive classification

Popularity

Health & Quality

Similar Plugins

fullstack-dev-skills

anthropic-essentials

godot-skills

feature-dev

prompts.chat

claude-code-toolkit