agent-estimate

Know before you build.
PERT estimates for AI-agent tasks — how long, which model's reliable enough, and the human-equivalent cost. In one command.
Website · Compare · PyPI
Why
AI agents can write the code — but how long will the task actually take? Manual estimation is slow and biased toward optimism; no estimate means scope creep and missed deadlines. The gap between "agents can do it" and "we know when it'll be done" is where projects break down.
agent-estimate closes that gap in one command: a three-point PERT timeline calibrated on real agent runs, plus a human-speed comparison so you see the compression before you spend the compute. It sizes the task, picks a tier, routes it to a model, and flags when the work runs past that model's reliability horizon — calibrated forecasts in seconds, not meetings.
Multi-model matters because the models aren't interchangeable. Opus 4.7, GPT-5.5, and Gemini 3.1 have different reliability horizons (METR p80) and different costs per turn. A safe 40-minute job for one model is a coin flip for another. agent-estimate models the whole fleet, not a single agent — so the number reflects who actually runs the work.
Quick Start
First estimate: 30 seconds to install. Every one after: instant.
With your agent (recommended)
Paste this into your Claude Code or Codex session:
Install the agent-estimate plugin (https://github.com/kiloloop/agent-estimate) and
estimate this task for me: "Implement OAuth 2.0 flow (Google + GitHub)". Tell me the
expected time, the human-speed equivalent, and the compression ratio.
Your agent installs the tool, runs the estimate, and reads back the numbers. Nothing to memorize — describe the task in plain English and let the agent translate to flags.
For a whole backlog:
Estimate every open issue in this repo with agent-estimate, group them into parallel
waves, and tell me the total wall-clock time for a 3-agent fleet versus doing them
sequentially myself.
Manual
pip install agent-estimate
agent-estimate estimate "your task description here"
No config required — sensible defaults for a 3-agent fleet (Claude, Codex, Gemini). Point it at a file or GitHub issues when you're ready:
agent-estimate estimate --file tasks.txt
agent-estimate estimate --repo myorg/myrepo --issues 11,12,14
agent-estimate session --agents 3 --rounds 2 --type review
How It Works
agent-estimate produces three-point PERT estimates calibrated for agents, not humans:
- Tier classification — auto-sizes tasks XS→XL from complexity signals
- PERT math — optimistic / most-likely / pessimistic, weighted to an expected value
- Human comparison — a per-task-type multiplier, so you see the compression
- METR thresholds — warns when an estimate exceeds a model's p80 reliability horizon
- Wave planning — schedules independent tasks in parallel across the fleet
- Review overhead — models review cycles as additive cost (
standard, complex, 3-round)
- Modifiers —
--spec-clarity, --warm-context, --agent-fit tune the estimate
Task types
| Type | Flag | Models |
|---|
| Coding | (default) | Feature work, fixes, refactors |
| Research | --type research | Audits, investigations, analysis |
| Documentation | --type documentation | API docs, guides, changelogs |
| Brainstorm | --type brainstorm | Ideation, spikes, design exploration |
| Config/SRE | --type config | Deploys, infra, CI/CD |
| Frontend/UI | --type frontend | Content patches vs. component builds |
| App dev | --type app_dev | App shells, desktop/mobile builds |
METR thresholds (defaults)
| Model | p80 threshold |
|---|
| Opus 4.7 | 90 min |
| GPT-5.5 | 90 min |
| GPT-5.4 | 60 min |
| Gemini 3.1 Pro | 45 min |
| Sonnet 4.6 | 30 min |
| Haiku 4.5 | 15 min |
opus_4_x is a forward-compatible alias that resolves to the current Opus threshold. Legacy keys (opus_4_6, GPT-5/5.2/5.3, Gemini 3 Pro, Sonnet) stay supported. Estimates are calibrated against Claude Code (Opus 4.7, high thinking) and Codex (GPT-5.4/5.5, extra-high) — shift with --spec-clarity and --warm-context for other setups.
Examples