From code-craftsmanship
Applies Feathers' legacy code change algorithm: break dependencies, write characterization tests, then safely modify untested codebases. Use for untestable classes, tangled modules, or code you're afraid to change.
How this skill is triggered — by the user, by Claude, or both
Slash command
/code-craftsmanship:working-with-legacy-codeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A field manual for changing code that has no tests, distilled from Michael C. Feathers' *Working Effectively with Legacy Code*. Use it to get untestable classes into a harness, pin down current behavior with characterization tests, and make changes one safe, verifiable step at a time — without resorting to a rewrite.
A field manual for changing code that has no tests, distilled from Michael C. Feathers' Working Effectively with Legacy Code. Use it to get untestable classes into a harness, pin down current behavior with characterization tests, and make changes one safe, verifiable step at a time — without resorting to a rewrite.
Legacy code is simply code without tests. Not old code, not ugly code — untested code: without tests you cannot know whether a change preserves behavior, so every edit is a gamble. The craft is breaking dependencies just enough to get tests in place before changing anything — cover and modify, never edit and pray.
Goal: 10/10. Rate changes to untested code 0-10 against the principles below. Report the current score and the specific steps needed to reach 10/10.
Core concept: The dilemma: to change code safely we need tests, but to get tests in place we have to change code. The way out is a fixed sequence — identify change points, find test points, break dependencies, write tests, then make changes and refactor — where the pre-test edits are conservative and mechanical, and the real change happens only inside the safety net.
Why it works: Edit-and-pray substitutes care for feedback, and care doesn't scale to code you don't fully understand. Cover-and-modify clamps existing behavior in a vise of tests, so any unintended change announces itself immediately on your machine instead of later in production.
Key insights:
Applications:
| Context | Application | Example |
|---|---|---|
| Bug fix in an untested module | Run the five steps before touching the bug | Pin parseInvoice() with tests, then fix the rounding error |
| PR mixing cleanup and a feature | Split into structure-only and behavior-only commits | Extract and rename first, tests green, then add the discount rule |
| "It's just a one-line change" | Find the nearest test point first | One pin test at the public method that calls the private one you edit |
See: references/change-algorithm.md
Core concept: A seam is a place where you can alter behavior in your program without editing in that place. Every seam has an enabling point — where you decide which behavior runs. Getting legacy code under test is largely a hunt for seams: spots where a test can substitute a slow, global, or external dependency while the production source stays untouched.
Why it works: If you must edit code to test it, you risk changing the very behavior you are trying to pin down. Seams move the substitution to a distance — a subclass, an import, a build flag — so the code under test runs exactly as in production while the test controls its dependencies from the enabling point.
Key insights:
jest.mock and unittest.mock.patch are link seams in modern clothingnew Database() inside a method body is a seam that never got built — constructors doing real work, globals, statics, and hard-wired I/O are where seams dieApplications:
| Context | Application | Example |
|---|---|---|
| Class constructs its own DB client | Object seam via constructor parameter | constructor(db: Db = new ProdDb()) — tests pass a fake |
Module calls a top-level send_email() | Import/link seam | mocker.patch("billing.send_email") or jest.mock("./mailer") |
| Logic reads the wall clock directly | Seam at the clock | Inject a now() provider; tests freeze time |
See: references/seams.md
Core concept: A characterization test documents what the code actually does right now — not what the spec, the comments, or anyone's memory says it should do. Write a probe you know will fail, let the failure message reveal the real behavior, then change the assertion to pin that behavior in place.
Why it works: In legacy systems the actual behavior is the de facto spec: callers, reports, and customers may depend on it, quirks included. Tests written from imagined requirements fail for reasons that tell you nothing, while characterization tests fail during refactoring precisely when — and where — you changed existing behavior.
Key insights:
expect(total).toBe(-1)), read the failure, pin the observed valueApplications:
| Context | Application | Example |
|---|---|---|
| Refactoring a tax calculator | Pin outputs for representative inputs | Run 20 cases through, assert each recorded result |
| Legacy report generator | Golden master diff | Generate the report, compare to a checked-in master file |
| Off-by-one found while pinning | Pin the wrong value, document it | assert days == 30 # BUG? expected 31 — TICKET-482 |
See: references/characterization-tests.md
Core concept: When you genuinely cannot get the area under test today, don't weave new logic into the untested mass. Sprout Method or Sprout Class: write the new behavior as fresh, fully tested code and call it from a single line in the legacy spot. Wrap Method or Wrap Class: rename the old code aside and add behavior before or after the call to it, decorator-style.
Why it works: New code in a fresh method or class can be test-driven even when its host can't be instantiated in a harness. The untested host changes by one call site — old behavior remains byte-for-byte intact, new behavior arrives fully verified, and the risk stays contained to a single line.
Key insights:
pay() to rawPay(), recreate pay() as the wrapperApplications:
| Context | Application | Example |
|---|---|---|
Late-fee rule in a 400-line process() | Sprout Method, one call line | total += lateFee(order) — lateFee() written test-first |
Audit logging around legacy pay() | Wrap Method | New pay() logs, calls rawPay(), logs again |
| New validation, class won't instantiate | Sprout Class | new OrderValidator().validate(data) called from legacy code |
Core concept: A catalog of mechanical, low-risk moves that sever whatever blocks instantiation or sensing: Extract Interface, Parameterize Constructor, Parameterize Method, Extract and Override Factory Method or Getter, Introduce Instance Delegator, Adapt Parameter, Break Out Method Object, Subclass and Override Method. Because they run before tests exist, always pick the least invasive technique that unblocks you.
Why it works: Code resists testing for a small set of recurring reasons — constructors doing real work, statics and singletons, parameters you can't construct, monster methods. Each blocker has a named, practiced counter-move, so you execute a known maneuver instead of improvising surgery on code that has no safety net.
Key insights:
HttpServletRequest in your own narrow interface and test against thatunittest.mock.patch or jest.mock can stand in for several techniques, but parameterizing leaves better design behindApplications:
| Context | Application | Example |
|---|---|---|
| Constructor opens a DB connection | Parameterize Constructor | def __init__(self, conn=None): self.conn = conn or connect() |
Static Billing.charge() called everywhere | Introduce Instance Delegator | Instance charge() delegates to the static; tests override it |
| 900-line method hoarding locals | Break Out Method Object | new RateCalculation(order, rates).run() — locals become fields |
See: references/dependency-breaking.md
Core concept: Before changing code you don't understand, invest in cheap comprehension: effect sketches trace what a change can affect, feature sketches show how methods and fields cluster inside a god class, scratch refactoring means refactoring recklessly to learn and then throwing the edits away, and telling the story of the system forces a simplifying summary. The payoff is finding pinch points — narrow places where a few tests cover wide behavior.
Why it works: In legacy code the bottleneck is comprehension, not typing. An effect sketch turns "what could this break?" from anxiety into a finite list, and a pinch point lets a handful of tests act as a vise over an entire cluster of methods — often revealing where a hidden class boundary wants to be drawn.
Key insights:
Applications:
| Context | Application | Example |
|---|---|---|
| "What breaks if I change this field?" | Effect sketch from the field outward | Three readers found; two pinch-point tests cover them |
| Feature due in a 5,000-line class | Pinch-point tests, then sprout | Cover postInvoice(), sprout the new rule as a class |
| Code nobody on the team understands | Scratch refactor on a branch | Extract and rename to learn, revert, plan the real moves |
See: references/case-studies.md
| Mistake | Why It Fails | Fix |
|---|---|---|
| Refactoring and changing behavior in one step | When something breaks, you can't tell which edit did it | Separate commits; tests green between each step |
| Writing "should" tests on legacy code | Imagined specs fail noisily and you "fix" load-bearing behavior | Characterize what the code does; file bugs separately |
| Mocking everything in sight | Tests pin the implementation, so every refactor breaks them | Fake only what blocks instantiation or sensing |
| Big-bang rewrite instead of incremental coverage | The old system keeps moving; rewrites ship late and miss years of edge cases | Cover and modify piece by piece |
| Silently fixing bugs found while characterizing | Callers and reports may depend on the wrong behavior | Pin it, document it, fix it as a separate deliberate change |
| Invasive cleanup before any tests exist | Every manual edit risks behavior with no net underneath | Least invasive technique; preserve signatures; lean on the compiler |
| Sprouting forever without payback | The host stays untested and sprouts ossify into the next legacy layer | Track sprout debt; cover hot spots on the next touch |
| Waiting for a dedicated "testing project" | That project never gets funded; coverage never appears | Grow coverage along every change you ship |
| Question | If No | Action |
|---|---|---|
| Do tests cover the code you're about to change? | You're editing and praying | Run the change algorithm; pin behavior before editing |
| Can you construct the class in a test harness? | Dependencies block separation | Parameterize Constructor, Extract Interface, or Sprout Class |
| Can a test sense the effect of your change? | Effects are invisible to assertions | Find a sensing point; Extract and Override Getter |
| Is this commit behavior-only or structure-only? | Mixed changes are unverifiable | Split it; run the tests between the two |
| Do you know everything this change can affect? | Unknown blast radius | Draw an effect sketch; test at the pinch points |
| Do your assertions state observed behavior? | You're testing wishes, not code | Probe, read the failure, pin the actual value |
| Is the seam you chose the cheapest one available? | Needless surgery risks behavior | Prefer constructor parameters and import seams first |
| Will the code be better covered after this change? | The next change costs as much as this one | Leave at least one pin test at the nearest test point |
Michael C. Feathers is the founder of R7K Research & Conveyance, a consultancy focused on software design and the rehabilitation of aging systems. A long-time consultant and conference speaker on legacy code, he wrote Working Effectively with Legacy Code (2004) and gave the field its working definition: legacy code is simply code without tests.
npx claudepluginhub wondelai/skills --plugin systems-architectureSafely modifies untested legacy code by getting it under test first using characterization tests, seam identification, sprout/wrap techniques, and pinch-point testing before making production changes.
Guides safe refactoring of untested legacy code using RGR workflow and characterization tests. Use when modifying code without test coverage.
Provides structured workflow packs for 7 common Claude Code tasks: codebase exploration, bug fixing, safe refactoring, TDD, repo review before merge, CLAUDE.md generation, and migration planning.