By duthaho
Execute structured, evidence-driven engineering workflows in Claude Code across 5 phases (Investigate, Design, Implement, Verify, Ship) using 15 skills and 8 specialist agents that enforce TDD, root-cause analysis, architecture/UX/security reviews, dependency audits, PR preparation, and incremental shipping with pasted test outputs, logs, and diffs as proof before advancing.
Use when reviewing the architecture dimension of a written plan. Dispatched primarily by plan-review-architecture (via plan-review). Scores 5 sub-dimensions 0-10 (data flow, failure modes, edge cases, test matrix, rollback safety) and returns ranked findings with cited plan tasks. <example> Context: A plan has been written and is about to be implemented. user: "Run plan-review on the cache-invalidation plan." assistant: "Dispatching the architect agent to score the architecture dimension while the experience-reviewer runs in parallel." </example> <example> Context: A migration plan needs an architecture-only pass. user: "I just need an arch review on this — skip the UX review." assistant: "Dispatching the architect agent directly." </example>
Use when reviewing a diff or PR for structural issues, error handling, edge cases, complexity, and style. Dispatched primarily by code-review-loop. Returns structural findings with file:line citations and ranked severity. Pairs with security-auditor for sensitive paths. <example> Context: A PR is ready for first-pass review. user: "Review my charge-endpoint PR before I tag humans." assistant: "Dispatching the code-reviewer agent to find structural issues, error-handling gaps, and complexity hotspots." </example> <example> Context: A refactor PR needs a sanity check. user: "Sanity-check this refactor PR." assistant: "Dispatching the code-reviewer to confirm behavior preservation and look for unintended changes." </example>
Use when reviewing the experience dimension of a written plan (UX + DX). Dispatched primarily by plan-review-experience (via plan-review). Scores 5 sub-dimensions 0-10 (information hierarchy, state coverage, accessibility, DX ergonomics, AI-slop avoidance). <example> Context: A plan with both UI and API changes needs review. user: "Run plan-review on the dashboard plan." assistant: "Dispatching the experience-reviewer agent in parallel with the architect to cover UX and DX in one pass." </example> <example> Context: A new public API surface is being added. user: "Review the DX of the new webhook API plan." assistant: "Dispatching the experience-reviewer to score DX ergonomics, error copy, and discoverability." </example>
Use when investigating bugs, errors, test failures, or unexpected behavior. Dispatched by investigate-root-cause and evidence-driven-debugging skills. Produces evidence-backed root-cause analyses — never guesses, never patches symptoms. <example> Context: An API endpoint is returning intermittent 500s. user: "The /api/users endpoint is throwing 500s sometimes." assistant: "Dispatching the investigator agent to gather evidence, write a hypothesis, and prove or refute it before any fix." </example> <example> Context: Tests passed locally but fail in CI. user: "My tests pass locally but CI is red." assistant: "Dispatching the investigator to find the env diff between local and CI and produce a hypothesis." </example>
Use when decomposing a spec into an executable plan. Dispatched primarily by the write-plan skill. Produces a numbered task list with file paths, exact test commands, dependency annotations, acceptance criteria per task, and a Risks section. <example> Context: An approved spec exists; implementation hasn't started. user: "Turn the auth-rotation spec into a plan we can execute." assistant: "Dispatching the planner agent to produce a numbered task list with file paths, test commands, and rollback notes." </example> <example> Context: A previous plan was rejected during plan-review for being too vague. user: "Re-plan the migration; the reviewers said it had no acceptance criteria." assistant: "Dispatching the planner agent to rebuild the plan with falsifiable acceptance lines per task." </example>
Use when investigating dependency bloat, security advisories, supply-chain risk, upgrade planning, or before adding a new third-party package. Activate for keywords like "deps", "dependencies", "package.json", "requirements.txt", "Cargo.toml", "audit", "CVE", "stale package", "do we use", "what depends on", "transitive dep". Produces a written audit with import-graph evidence — never trust scanner output without verifying call sites.
Use when opening a PR for review or when receiving review feedback. Activate for keywords like "code review", "PR review", "request review", "review feedback", "address comments", "reviewer said". Covers both ends of the loop: preparing a reviewable PR and acting on feedback rigorously. Always engage with every comment -- never dismiss feedback by silently ignoring it.
Use during active debugging when you have a hypothesis to test or need to instrument a running system. Activate for keywords like "debug", "instrument", "log", "trace", "breakpoint", "what's happening at runtime", "production behavior". Pair to investigate-root-cause for the doing-it phase. Always record what you observed -- never debug entirely "in your head" without leaving evidence behind.
Use when implementing a non-trivial feature, migration, or refactor that would otherwise be a single large change. Activate for keywords like "feature flag", "incremental", "vertical slice", "migration", "rollout", "behind a flag", "ship small". Enforces vertical slicing, feature-flagged rollout, and refactor-with- evidence (behavior-preserving changes proved by test/perf deltas). Always ship the smallest reversible change -- never bundle unrelated improvements.
Interactive setup wizard for claudekit. Scaffolds rules, hooks, and MCP server configs into the user's project. Run /claudekit:init to configure. Use when setting up a new project with claudekit or reconfiguring an existing one.
Creative exploration mode — divergent thinking, multiple alternatives, structured trade-offs before any code
Critical analysis mode — find issues first, severity-tagged findings, actionable suggestions
Thorough investigation mode — completeness over speed, evidence-cited, confidence levels named
Code-focused execution mode — minimal prose, action-oriented updates, follow established patterns
Compressed output mode — minimal prose, code-first, no preambles
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A verification-first engineering toolkit for Claude Code. Built for senior ICs and tech leads who already know how to ship production code — and want a workflow that keeps the discipline tight without getting in the way.
15 skills, 8 agents, one philosophy: every claim has evidence. No tests pass — trust me. No it works in my IDE. No I think the cache is stale. Skills produce artifacts you could paste into a code review.
verification-gate runs before any "done" claim — runs the tests, checks the negative path, exercises the change in a non-IDE environment, cross-checks the original ask./plugin marketplace add duthaho/claudekit-marketplace
/plugin install claudekit
/claudekit:init
/claudekit:init interactively scaffolds rules, hooks, and MCP server configs into your project's .claude/ directory. Output styles ship with the plugin and are auto-discovered by Claude Code (no init step required).
| Phase | Skills | What's enforced |
|---|---|---|
| Investigate | investigate-root-cause, map-codebase, audit-dependencies | Every claim about the system has a <file:line> citation. No memory-based assertions. |
| Design | shape-spec, write-plan, plan-review, plan-review-architecture, plan-review-experience | Plans have file paths, exact test commands, falsifiable acceptance criteria, named rollbacks. Reviewed before implementation. |
| Implement | test-first, incremental-shipping | Red-green-refactor with pasted runner output. Vertical slices behind feature flags. Refactors prove behavior preservation with test/perf deltas. |
| Verify | verification-gate, evidence-driven-debugging | Mandatory pre-completion gate. Active debugging keeps a paper trail. |
| Ship | code-review-loop, release-and-changelog | Reviewable PRs with verification evidence pasted. Atomic releases with diff-built changelogs. |
| Setup (off-spine) | init | One-time scaffolding wizard for project-level config. |
All 15 skills are user-invocable as /claudekit:<name>.
Five Claude Code output styles ship with the plugin. They're auto-discovered by Claude Code — no init step required. Switch via /config or by setting outputStyle in .claude/settings.local.json.
| Style | When to use |
|---|---|
| Brainstorm | Creative exploration — divergent thinking, multiple alternatives, structured trade-offs before any code |
| Deep Research | Thorough investigation — completeness over speed, evidence-cited findings with confidence levels |
| Implementation | Code-focused execution — minimal prose, action-oriented updates, follow established patterns |
| Review | Critical analysis — find issues first, severity-tagged findings, actionable suggestions |
| Token Efficient | Compressed output — minimal prose, code-first, no preambles |
All styles use keep-coding-instructions: true, so Claude's default coding/testing/verification discipline still applies underneath.
Each agent has a single dispatcher and a clear job. No agent-bloat.
| Agent | Job | Dispatched by |
|---|---|---|
claudekit:planner | Decompose specs into executable plans | write-plan |
claudekit:architect | Score architecture dimension of a plan | plan-review-architecture |
claudekit:experience-reviewer | Score UX + DX dimension of a plan | plan-review-experience |
claudekit:investigator | Root-cause investigation with evidence chain | investigate-root-cause, evidence-driven-debugging |
claudekit:tester | Design and write tests with red-green discipline | test-first |
claudekit:code-reviewer | Pre-merge structural review of diffs | code-review-loop |
claudekit:security-auditor | OWASP-aligned review of sensitive paths | code-review-loop (sensitive paths) |
claudekit:scout | Codebase mapping and dependency audits | map-codebase, audit-dependencies |
/claudekit:init configuresnpx claudepluginhub duthaho/claudekit --plugin claudekitCode transformation: Dev SDLC orchestrator (code-shipping pipeline), plan, assert, audit, review, test, refactor, debug, for-sure. Hosts engineering agents.
Corca Workflow Framework — consolidated hooks and skill orchestration for structured development sessions
AI-powered development workflow automation - Phase-based planning, implementation orchestration, preflight code quality checks with security scanning, ship-it workflow, and development principles generator for CLAUDE.md
PROJECT.md-first autonomous development with hybrid auto-fix documentation. 8-agent pipeline, auto-orchestration, docs auto-update on commit (true vibe coding). Knowledge base system with 90% faster repeat research. Strict mode enforces SDLC best practices automatically. Works for ANY Python/JavaScript/TypeScript/Go project.
Production-ready Claude Code configuration with role-based workflows (PM→Lead→Designer→Dev→QA), safety hooks, 44 commands, 19 skills, 8 agents, 43 rules, 30 hook scripts across 19 events, auto-learning pipeline, hook profiles, and multi-language coding standards
Persona-driven AI development team: orchestrator, team agents, review agents, skills, slash commands, and advisory hooks for Claude Code