From Progressive Dynamic Research
Multi-tier agent funnel for research and architecture decisions — cheap models discover breadth, an adversarial tier kills weak options, the strong model only synthesizes the few survivors. Use this whenever a question needs broad, fact-checked comparison across many options or sources and you're about to fan out subagents or author a dynamic Workflow: "compare these N approaches", "what's the best way to X", "research and recommend", "evaluate this architecture", "audit / decide between candidates". Also use it to decide WHETHER to fan out at all (vs. just reason), how to budget-cap a sweep, and how to avoid the classic failure modes (stale constraints, re-opening filtered sets, trusting a surface search, fanning out for thinking you're avoiding). Pulls in the canonical workflow template, JSON schemas, and the pall() helper that avoids the common `parallel(...).filter is not a function` paren bug.
How this skill is triggered — by the user, by Claude, or both
Slash command
/progressive-dynamic-research:progressive-dynamic-researchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A progressively-narrowing (pyramid-shaped) funnel for turning a vague "research
A progressively-narrowing (pyramid-shaped) funnel for turning a vague "research this and decide" into a deterministic, cost-bounded, parallel sweep whose output is a structured object you can act on. The governing idea: breadth is bought with the cheapest tokens; depth is spent only on what survived an adversary.
▲ APEX 1 strong agent (opus) — deep synthesis over survivors only
███ KILL N skeptics (sonnet) — try to REFUTE each survivor
█████ FIT N scorers (haiku) — score vs fixed priorities, drop misfits
███████ DEDUP plain code (no agent) — collapse near-duplicates
█████████ BASE ~5 cheap agents (haiku)— wide, high-recall option discovery
⚠️ READ THIS BEFORE YOU WRITE ANY WORKFLOW — the one bug everyone hits.
await parallel(x.map(...)).filter(...)binds.filterto the Promise, not the array, so you get.filter is not a function. It cost one author three failed launches from the same line copy-pasted into four tiers. Always route through thepall()helper and assign in two steps — never inline the await-wrap — and runnode --checkbefore launching:const pall = async (thunks) => (await parallel(thunks)) || [] // always an array const raw = await pall(items.map(it => () => agent(/* ... */))) const kept = raw.filter(Boolean) // safeThe bundled
scripts/pyramid-template.jsalready does this correctly — start from it and you avoid the trap. Full list of harness gotchas:references/harness-notes.md.
A workflow is for breadth you cannot hold in one context — not for thinking you're avoiding. Run this check first; it's the highest-leverage decision here.
If it passes all three, fan out. Otherwise, reason inline.
A pyramid spawns dozens of agents and can spend a lot; decide the spend before it runs, not after it overshoots. Skipping this is the single most expensive mistake — a real run sized a budget for reasoning agents, then ran web-grounded ones, blew the cap mid-flight, and the load-bearing tier silently never ran.
Default (a user is in the loop): present a PLAN, then put the launch decision in
an AskUserQuestion MENU and WAIT. Do not launch in the same turn. The decision
must be tappable, not a "reply to adjust" paragraph the user has to reverse-engineer
into prose. Two parts, in this order:
AskUserQuestion with a SINGLE question — so a user happy with the
defaults approves the whole run in one tap. The levers do NOT each get their
own question up front (that forces 4 taps on someone who just wants to go). The
one question is the accept/change choice:
Exception — explicitly autonomous runs only (a cron/headless/loop context, or the user said "just run it / don't ask"): announce the same four items in one line and proceed without blocking. If a user is present and has not waived approval, the default gate applies — "they can always veto later" is not a substitute for waiting (the run is already spending by then).
If the user already gave a token target (a +Nk directive), use it as the budget
and say so in the plan instead of proposing your own.
Pick the cap from the run's shape, not a habit:
HARD ≈ 220k, WARN ≈ 150k.references/harness-notes.md → "The dominant cost is tool-use
LOOPS on a wide tier."Three rules that keep the cap from backfiring:
budget.spent(). It is a shared,
session-cumulative, output-only counter — a raw gate trips on whatever the
session spent before you launched. Capture START = budget.spent() at the top
and gate on budget.spent() - START. (A real run tripped a 150k gate at "500k"
when the workflow itself had spent ~90k — the rest was prior session output.)log() where it stopped
(HARD CAP at post-verify: 812k) so you know the result is complete enough to
use — or exactly which tier got starved.If a token target was set for the turn (a +Nk directive), read it off the global
budget and scale the fan-out to it. Mechanics + the gate() helper:
references/harness-notes.md → "Budget guard". Both bundled templates ship a
guard already sized for their funnel — the option-selection one for reasoning, the
claim-verification one for web-grounded.
Map the funnel to the task; don't run all five tiers mechanically.
references/schemas.md.Use the bundled template as the starting point — it encodes the hard-won mechanics so you don't rediscover them:
scripts/pyramid-template.js — the canonical option-selection funnel:
BASE → DEDUP → FIT → KILL → CRITIC → APEX, with the pall() helper,
dedup-in-code, per-tier schemas, a multi-vote KILL quorum, a coverage critic,
and a delta-based budget guard. It is args-driven: don't edit the file —
invoke it with Workflow({ scriptPath, args: { problem, priorities, angles, webTiers, killVotes, maxFit, maxKill, hard, warn } }). webTiers is a
per-tier list — ['base','kill'] lets BASE fetch for discovery recall
(single-pass — it must not loop) and KILL fetch to verify facts, while FIT
(the wide, loop-prone tier) never fetches. webGrounded:true is a back-compat
alias for ['kill']. Use when the task is
"choose among N candidate approaches". Read the header block for the full args.scripts/pyramid-template.test.mjs — the $0 test suite (node:test, no
deps). Runs the workflow body under mocked agent/parallel/budget (no LLM
calls) across many scenarios — args validation, funnel shape + return contract,
model tiering, kill-quorum math, per-tier web wiring (FIT never fetches),
FIT-width cap, and the budget gate firing on the delta. Each asserts an
invariant whose failure would burn real tokens. Run node --test scripts/*.test.mjs
after any edit and before spending on a real run — it catches structural bugs the
way node --check catches syntax ones. (scripts/harness-lib.mjs is the shared
runner; scripts/stub-harness.mjs is a quick single-scenario visual check.)scripts/claim-verification-template.js — the claim-verification funnel:
DISCOVER → DEDUP → web-grounded VERIFY → cited SYNTH, no FIT tier. Use when
the task is "research these N questions and give me verified, cited facts /
hard numbers / prior art" rather than down-selecting options. Encodes the two
fixes a real run cost us: a fetch-sized budget (web-grounded tiers cost
5–10× reasoning tiers) and a per-bucket verify cap (a global slice silently
starves some questions → false "evidence gaps" at the apex). See
references/harness-notes.md → "Web-grounded tiers" and "Per-bucket caps".references/schemas.md — the JSON schemas for each tier (candidate, fit,
verdict, deep-dive) and a minimal triage variant.references/harness-notes.md — runtime gotchas of the dynamic-workflow
harness: the pall() fix for parallel(), pipeline() vs parallel()
(barrier) choice, no Date.now()/Math.random(), no apostrophes in prompts,
node --check before launch, and resume-from-cache. Read this when a workflow
errors or stalls.See the ⚠️ callout at the top: the await parallel(...).filter(...) precedence trap
that yields .filter is not a function. Route through pall(), assign in two steps,
node --check before launch. The template already does this.
The harness scales the legwork; it does not scale the thinking. Keep these in your own loop and delegate the rest to the fan-out:
| Keep in the main loop (thinking) | Delegate to the fan-out (legwork) |
|---|---|
| Choosing the decisive gate | Breadth you can't hold in context |
| Keeping the constraint set live | Parallel scorecards vs a rubric |
| Closing sets once filtered | Adversarial refutation of survivors |
| Knowing reasoning vs search | Web-grounded fact-checks at scale |
Use the advisor as a pre-commit reviewer before any substantive write — it sees the full transcript and catches framing errors a self-review won't.
Across many real runs the reliable shape was: a triage tier that kills ~80% of candidates against one decisive gate, a parallel deep-dive over the survivors, and an adversarial verify + strong-model synthesis. The funnel typically narrows on the order of dozens of raw candidates → a handful of survivors → one recommendation, with the expensive model touching only the final handful. Keep a rejected-with-reasons list so a reviewer can see what died and why.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub yafimk/claude-skills --plugin progressive-dynamic-research