From scott-cc
Design or audit CLI tools for agent compatibility. Covers stdout/stderr separation, --json flag, exit codes, NDJSON streaming, TOON format, composability, and idempotency. Use when building a new CLI or reviewing an existing one for LLM/agent use.
How this skill is triggered — by the user, by Claude, or both
Slash command
/scott-cc:cli-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design or audit a CLI tool so that AI agents can invoke it reliably, parse its
Design or audit a CLI tool so that AI agents can invoke it reliably, parse its output without brittle text scraping, and compose it into pipelines.
Use this skill when:
Ask (or infer from context):
Run through each item. For existing CLIs, check the source. For new CLIs, use this as the design spec.
Rule: Primary output and all machine-readable data → stdout. Logs, warnings, errors, status messages → stderr.
Why it matters: when a command is piped (cmd | jq), stderr is shown to the
user while stdout flows to the next program. Mixing them breaks parsers.
# Good
echo '{"status":"ok"}' >&1 # data to stdout
echo "Processing..." >&2 # status to stderr
# Bad — breaks piping
echo "Processing..." # status to stdout pollutes parsers
echo '{"status":"ok"}'
Check: does the CLI ever write log lines, progress, or warnings to stdout? If yes, flag as a defect.
Rule: Expose
--json(or--output json) that switches output to structured JSON. Keep human-readable output as the default.
Adopted by: GitHub CLI (gh), AWS CLI, npm, yarn, pnpm, eslint.
# Human default
$ my-tool list
item-a active
item-b inactive
# Machine mode
$ my-tool list --json
[{"name":"item-a","status":"active"},{"name":"item-b","status":"inactive"}]
Design rules for --json output:
my-tool list --json | jq '.[].name' must workDo NOT auto-detect TTY to switch formats. The --json flag must be explicit.
(TTY auto-detection was refuted as a best practice — use explicit flags.)
Rule: Exit 0 on success, non-zero on any failure.
This is the mechanism scripts and agents use to detect failure. An agent that
calls subprocess.run() checks returncode, not stdout content.
# Good — agent can check exit code
result = subprocess.run(["my-tool", "deploy"], capture_output=True)
if result.returncode != 0:
raise DeployError(result.stderr.decode())
Common conventions:
0 — success1 — general error (safe default for all failures)2 — misuse / bad arguments (some tools)For partial success (some items succeeded, some failed): encode per-item
status in structured output (--json), and still exit non-zero if any item
failed. Let the agent parse which items failed from structured output.
For long-running commands (builds, deploys, watches), use NDJSON (newline-delimited JSON): one JSON object per line to stdout.
Terraform's terraform apply -json is the canonical implementation:
{"@level":"info","@message":"Initializing...","@module":"terraform","@timestamp":"2026-06-17T10:00:00Z","type":"log"}
{"@level":"info","@message":"Plan: 3 to add","@module":"terraform.ui","@timestamp":"2026-06-17T10:00:01Z","type":"planned_changes","changes":{"add":3,"change":0,"remove":0}}
Standard NDJSON message fields (adopt these):
| Field | Type | Notes |
|---|---|---|
@level | string | debug, info, warn, error |
@message | string | Human-readable summary |
@module | string | Component that emitted this |
@timestamp | string | RFC3339 (2026-06-17T10:00:00Z) |
type | string | Machine-readable event type |
Schema versioning rules (semver):
# Agent consuming NDJSON
my-tool build --json 2>/dev/null | while IFS= read -r line; do
echo "$line" | jq -r 'select(.type == "error") | .["@message"]'
done
Rule: Accept input from stdin when it makes sense. Output to stdout. Support piping as a first-class workflow.
# Composable pipeline
cat items.json | my-tool process --json | jq '.results[]'
# File or stdin interchangeably
my-tool process input.json # file arg
my-tool process < input.json # stdin
cat input.json | my-tool process # pipe
Design checklist:
- as explicit stdin)--json output (color is for TTY humans)--no-color / NO_COLOR env var respected for human outputFor non-interactive / agent invocation, destructive operations should either:
--yes / --force flag (preferred)Rule: Running the same command twice should produce the same result. Agents retry on failure — a non-idempotent CLI causes double-creates.
Design patterns:
--force to overwrite if already exists (don't error on re-run)# Idempotent — safe to retry
$ my-tool create-bucket my-bucket
Created: my-bucket
$ my-tool create-bucket my-bucket
Already exists: my-bucket (no change) # exit 0, not exit 1
TOON (@toon-format/cli) is a tabular encoding
for JSON arrays that uses explicit count/field headers and CSV-style rows.
Designed to improve LLM parsing reliability by giving models a clear schema.
When to consider it: You're passing large arrays of uniform objects to an LLM and token count matters.
When to skip it: Standard tooling (jq, curl, etc.) doesn't understand
TOON. Use JSON as your primary interchange format; TOON is an LLM-layer
optimization only.
# JSON
[{"name":"alice","role":"admin"},{"name":"bob","role":"user"}]
# TOON
users[2]{name,role}:
alice,admin
bob,user
Syntax: arrayName[N]{field1,field2,...}: followed by N CSV rows.
The explicit [N] count and {fields} header give models a schema to follow.
# Install
npm install -g @toon-format/cli
# Encode JSON → TOON (pipe)
cat data.json | npx @toon-format/cli
# Decode TOON → JSON
cat data.toon | npx @toon-format/cli --decode
# Auto-detected by file extension
npx @toon-format/cli -o output.toon input.json # .json → encode
npx @toon-format/cli -o output.json input.toon # .toon → decode
# Estimate token savings (self-reported, use as a guide)
npx @toon-format/cli --stats < data.json
Flags:
-e / --encode — explicit encode-d / --decode — explicit decode-o / --output <file> — write to file (default: stdout)--stats — print token count comparisonCaveat: TOON token-saving percentages are self-reported. The tool's
--statsflag shows an estimate; treat it as a guide, not a guarantee. TOON is young — validate independently before relying on it for production LLM pipelines.
Present a table:
CLI Agent Compatibility Audit
──────────────────────────────────────────────────
✓ stdout/stderr separated
✗ --json flag missing → add --json to list/show subcommands
✓ exit codes correct
○ no NDJSON for build command → consider for long-running ops
✗ list command writes progress to stdout → move to stderr
○ create not idempotent → add --force or silent-success on re-run
──────────────────────────────────────────────────
Critical (break agent parsing): 2
Recommended: 2
Optional: 1
Follow up with concrete diffs or pseudocode for each ✗ item.
Output a checklist contract the implementation must satisfy:
CLI Design Contract
──────────────────────────────────────────────────
[ ] All data to stdout, all status/errors to stderr
[ ] --json flag on every subcommand that produces output
[ ] Exit 0 on success, 1 on failure (document partial-success behavior)
[ ] NDJSON for <list long-running commands>
[ ] Reads stdin when no file arg given
[ ] --yes / --force for destructive operations (no TTY prompting)
[ ] --no-color respected; NO_COLOR env var honored
[ ] Idempotent: re-running succeeds if state already matches
──────────────────────────────────────────────────
Then propose the JSON schema for --json output before any code is written.
From clig.dev and ecosystem conventions:
| Flag | Purpose |
|---|---|
--json | Structured JSON output |
--output <format> | Multi-format: json, text, table |
--no-color | Disable ANSI color codes |
--quiet / -q | Suppress informational output |
--verbose / -v | More output (goes to stderr) |
--yes / -y | Skip confirmation prompts |
--force / -f | Overwrite / proceed despite warnings |
--dry-run | Show what would happen without doing it |
The minimum viable agent-compatible CLI is three things:
--json flagEverything else (NDJSON, TOON, idempotency, schema versioning) adds value as the tool's agent usage grows. Ship the three-thing version first.
npx claudepluginhub citadelgrad/scott-cc --plugin scott-ccGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.