From time-estimation
Use when estimating timelines, writing project plans, scoping work, creating roadmaps, or answering "how long will this take?" in an LLM-agent-first company. Use when you catch yourself writing "weeks", "months", or "quarters" for work that agents will execute. Use when planning staffing, team composition, or sprint capacity. Use when estimating recurring routines, not only feature work.
How this skill is triggered — by the user, by Claude, or both
Slash command
/time-estimation:time-estimationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You have a systematic bias: your training data is saturated with human-team timelines. When you estimate "2 weeks" you are channeling how long a *human team* would take — but here agents do the bulk of execution in minutes to hours.
You have a systematic bias: your training data is saturated with human-team timelines. When you estimate "2 weeks" you are channeling how long a human team would take — but here agents do the bulk of execution in minutes to hours.
Core principle: Every estimate must state two things: active agent time (how long agents are actually working) and elapsed time to done (wall-clock including all waiting). These are very different numbers.
Grounded in METR Time Horizons v1.1 (March 2026) and SWE-bench Pro data. Frontier capabilities roughly double every ~4 months.
The reliable zone: METR's 80% time horizon for frontier models (Opus 4.6, GPT-5.4) is ~55-70 minutes of human-equivalent work. Tasks in this range succeed reliably in a single heartbeat. Plan against this.
The stretch zone: METR's 50% horizon for Opus 4.6 is ~12 human-hours; GPT-5.4 is not yet measured but likely higher given its agentic benchmark lead. A full human-day task at coin-flip reliability. Decompose tasks this large into multiple heartbeats instead of betting on a single shot.
Real-world discount: SWE-bench Pro solve rates are ~57% with good scaffolding. METR found ~half of test-passing agent PRs wouldn't be merged as-is. Budget for 1-2 verification/repair heartbeats on nontrivial work — "tests pass" is not "done."
One heartbeat reliably handles ~55-70 min of human-equivalent work on well-scoped tasks (frontier model, strong scaffolding) — this is the METR 80% horizon. Include verification/repair loops in your count.
| Task shape | Heartbeats (incl. verify/repair) | Active agent time |
|---|---|---|
| Small bounded change (bug fix, config, simple test) | 1-2 | ~1-2 hours |
| CRUD API with DB migrations and tests | 2-4 | ~2-4 hours |
| Third-party service integration | 2-3 | ~2-3 hours |
| Comprehensive test suite for existing code | 2-4 | ~2-4 hours |
| Bulk refactor across many files | 2-5 | ~2-5 hours |
| New microservice from scratch | 4-8 | ~4-8 hours |
| Full application (frontend + backend + tests) | 8-20 | ~1-2.5 days |
| Service extraction from monolith | 5-12 per service | ~5-12 hours |
For poorly-understood codebases or ambiguous requirements, multiply by 2-3x.
Multiple agents can work in parallel, but don't assume perfect scaling. Integration points, review queues, merge conflicts, and shared state create contention.
State your parallelism assumption explicitly. "N heartbeats across M agents" should include what blocks parallel execution.
| Step | Typical wait |
|---|---|
| Human code review / sign-off | 2 hours - 1 day |
| Human answers a clarifying question | 1 hour - 1 day |
| Human makes a strategic/architecture decision | 1-3 days |
| External vendor/API access provisioning | 1-5 days |
| Legal/compliance review | 2-5 days |
| Staged production rollout observation | 3-7 days (parallelizable across services) |
| SOC 2 audit (the audit itself, not writing docs) | 4-8 weeks |
| Enterprise customer procurement/contract | 2-8 weeks |
Assume humans batch reviews, not that they respond instantly.
Not all estimation is for feature work. For recurring tasks (daily reports, weekly reviews, periodic migrations, scheduled maintenance):
Example: "Weekly dependency audit: 1 heartbeat (~1 hour active) per run, ~2 hours elapsed including review. Setup: 2-3 heartbeats one-time."
When you catch yourself writing a human-scale number, check whether you've decomposed it:
| If your instinct says | Ask: where is the time actually going? |
|---|---|
| 1-2 weeks | Agent finishes in hours. Review cycles → likely 1-3 days |
| 1 month | A few rounds of human feedback → likely 3-7 days. External gates → could be 2-4 weeks |
| 1 quarter | Agents parallelize (with contention). Humans serialize. → likely 2-4 weeks. Heavy external deps → could be a quarter |
| 6+ months | Only if regulatory gates, enterprise sales, or multi-month audits are on the critical path |
These are guidelines. The actual answer depends on the decomposition.
Not everything compresses. Be honest:
State why it's long (which gates), not just a big number.
Every estimate must include:
Example:
Active agent time: ~8 heartbeats (~8 hours) Elapsed time: ~2 days
- Agent: 8 heartbeats, 3 parallel agents (2-3x speedup, same-repo contention), ~4 hours elapsed compute
- Human wait: 2 review rounds, ~1 day each assuming batch review
- Critical path: human review cycles
- Assumes: reviews within same business day, requirements are clear
After completing work, compare estimates to actuals. Track:
| Field | Purpose |
|---|---|
| Estimated active agent time | What you predicted |
| Actual active agent time | What it took |
| Estimated elapsed time | Wall-clock prediction |
| Actual elapsed time | Wall-clock reality |
| Miss cause | What drove the gap (rework, human delay, scope change, discovery, integration issues) |
Common miss patterns:
Use misses to update your per-task heartbeat anchors over time. The calibration table in this skill is a starting point, not ground truth for your specific codebase and team.
If you find yourself writing any of these, pause and decompose:
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub ustas-eth/skills --plugin time-estimation