agent-estimate

Know before you build.

PERT estimates for AI-agent tasks — how long, which model's reliable enough, and the human-equivalent cost. In one command.

Website · Compare · PyPI

Why

AI agents can write the code — but how long will the task actually take? Manual estimation is slow and biased toward optimism; no estimate means scope creep and missed deadlines. The gap between "agents can do it" and "we know when it'll be done" is where projects break down.

agent-estimate closes that gap in one command: a three-point PERT timeline calibrated on real agent runs, plus a human-speed comparison so you see the compression before you spend the compute. It sizes the task, picks a tier, routes it to a model, and flags when the work runs past that model's reliability horizon — calibrated forecasts in seconds, not meetings.

Multi-model matters because the models aren't interchangeable. Opus 4.7, GPT-5.5, and Gemini 3.1 have different reliability horizons (METR p80) and different costs per turn. A safe 40-minute job for one model is a coin flip for another. agent-estimate models the whole fleet, not a single agent — so the number reflects who actually runs the work.

Quick Start

First estimate: 30 seconds to install. Every one after: instant.

With your agent (recommended)

Paste this into your Claude Code or Codex session:

Install the agent-estimate plugin (https://github.com/kiloloop/agent-estimate) and
estimate this task for me: "Implement OAuth 2.0 flow (Google + GitHub)". Tell me the
expected time, the human-speed equivalent, and the compression ratio.

Your agent installs the tool, runs the estimate, and reads back the numbers. Nothing to memorize — describe the task in plain English and let the agent translate to flags.

For a whole backlog:

Estimate every open issue in this repo with agent-estimate, group them into parallel
waves, and tell me the total wall-clock time for a 3-agent fleet versus doing them
sequentially myself.

Manual

pip install agent-estimate
agent-estimate estimate "your task description here"

No config required — sensible defaults for a 3-agent fleet (Claude, Codex, Gemini). Point it at a file or GitHub issues when you're ready:

agent-estimate estimate --file tasks.txt
agent-estimate estimate --repo myorg/myrepo --issues 11,12,14
agent-estimate session --agents 3 --rounds 2 --type review

How It Works

agent-estimate produces three-point PERT estimates calibrated for agents, not humans:

Tier classification — auto-sizes tasks XS→XL from complexity signals
PERT math — optimistic / most-likely / pessimistic, weighted to an expected value
Human comparison — a per-task-type multiplier, so you see the compression
METR thresholds — warns when an estimate exceeds a model's p80 reliability horizon
Wave planning — schedules independent tasks in parallel across the fleet
Review overhead — models review cycles as additive cost (standard, complex, 3-round)
Modifiers — --spec-clarity, --warm-context, --agent-fit tune the estimate

Task types

Type	Flag	Models
Coding	(default)	Feature work, fixes, refactors
Research	`--type research`	Audits, investigations, analysis
Documentation	`--type documentation`	API docs, guides, changelogs
Brainstorm	`--type brainstorm`	Ideation, spikes, design exploration
Config/SRE	`--type config`	Deploys, infra, CI/CD
Frontend/UI	`--type frontend`	Content patches vs. component builds
App dev	`--type app_dev`	App shells, desktop/mobile builds

METR thresholds (defaults)

Model	p80 threshold
Opus 4.7	90 min
GPT-5.5	90 min
GPT-5.4	60 min
Gemini 3.1 Pro	45 min
Sonnet 4.6	30 min
Haiku 4.5	15 min

opus_4_x is a forward-compatible alias that resolves to the current Opus threshold. Legacy keys (opus_4_6, GPT-5/5.2/5.3, Gemini 3 Pro, Sonnet) stay supported. Estimates are calibrated against Claude Code (Opus 4.7, high thinking) and Codex (GPT-5.4/5.5, extra-high) — shift with --spec-clarity and --warm-context for other setups.

agent-estimate

Popularity

What's Inside

README

agent-estimate

Why

Quick Start

With your agent (recommended)

Manual

How It Works

Task types

METR thresholds (defaults)

Examples

Confidence

Similar Plugins

claude-flow

agent-foreman

helloagents

plan

agent-orchestration

nexus-agents

Popularity

Health & Quality

Similar Plugins

claude-flow

agent-foreman

helloagents

plan

agent-orchestration

nexus-agents