From infrahub
Collects a redacted diagnostic bundle (logs, config, version, state) from a local Infrahub instance for OpsMill support hand-off. Useful when Infrahub is broken or failing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/infrahub:infrahub-collecting-diagnosticsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
When an Infrahub user reports a problem, an expert
examples.mdflag-checks.mdreference.mdrules/_sections.mdrules/_template.mdrules/bundle-layout.mdrules/collection-read-only.mdrules/connection-info-and-token.mdrules/cross-link-reporting-issues.mdrules/deployment-detection.mdrules/flag-checks-deterministic.mdrules/infrahubctl-only-for-instance.mdrules/manifest-template.mdrules/multi-replica-coverage.mdrules/redaction-tiers.mdrules/workflow-user-gates.mdWhen an Infrahub user reports a problem, an expert
(OpsMill support, or a senior engineer) needs a
consistent set of artifacts to triage it: logs,
config, version info, branch state, environment
fingerprints. This skill walks the user through
producing that artifact set as a single
infrahub-diagnostics-YYYYMMDD-HHMMSS/ directory in
their working directory.
The skill is a collector. It runs read-only
commands, auto-redacts known secrets, pauses for the
user to review the redaction summary, then finalizes
the bundle. It does not file a GitHub issue (use
infrahub-reporting-issues for that), does not
diagnose root cause beyond a small set of
deterministic flag checks, and does not mutate
Infrahub state.
The bundle is built so that an expert, opening it for
the first time, can answer "what version, what
deployment, what changed recently, what failed" by
reading README.md and manifest.yml alone.
Trigger this skill when the user says things like:
Do not trigger when:
infrahub-reporting-issues)infrahub-analyzing-data)infrahub-auditing-repo)Follow these steps in order. Four user-gates; everything else is automatic.
Ask the user to describe the problem in their own words if they haven't already. Don't probe yet — listen for product names, version numbers, error messages, workflow context.
The skill uses infrahubctl exclusively for instance
state. infrahubctl needs a URL and (for any
non-anonymous deployment) an API token to talk to the
server. Ask the user before any other probing:
Infrahub URL or IP — e.g.,
http://localhost:8000, https://infrahub.example.com.
Required. The skill cannot guess this for any
non-local deployment.
API token — required for any infrahubctl
call against a deployment that has anonymous
access disabled (the typical case). Present this
reassurance verbatim when asking for it:
Your API token is only used locally by
infrahubctlto query state on your behalf. It is never written to the bundle. The skill's redactor masks the token before any bundle file is finalized. The token is not sent anywhere outside your machine.
Optional: Branch name if the user wants to scope the diagnostic to a non-default branch.
Once the user shares the values:
INFRAHUB_ADDRESS=<url> and
INFRAHUBCTL_TOKEN=<token> in the shell for the
rest of the workflow.If the user declines to share a token, do not press them — this is a legitimate concern. Fall back:
infrahubctl state query (branches,
repos, schema, tasks, telemetry).collected.infrahubctl_state: false in
manifest.yml so the expert sees the bundle is
partial and why.See rules/connection-info-and-token.md for the full rule (including the privacy notice wording).
Detect deployment topology by trying, in order:
docker compose ps (Compose)kubectl -n infrahub get pods (Kubernetes)tasks/demo.py plus invoke demo.status
(local dev)Then collect baseline artifacts (see
bundle-layout for the
exact files). The baseline log window is 24 hours
unless the user changes it. Telemetry
(infrahubctl telemetry export) is included unless
the user declined to share a token or
INFRAHUB_TELEMETRY_OPTOUT=true is set on the
server.
Ask the bug-report-template fields verbatim (see manifest-template) and use the answers to assign one of the categories in this skill's catalog. Confirm the classification with the user. The user can override; they can also choose everything mode, which runs every category's depth collection.
The categories are:
| # | Category | When |
|---|---|---|
| 1 | installation-startup | Containers crash on docker compose up; healthchecks loop; port conflicts |
| 2 | upgrade | After infrahub upgrade or helm upgrade; branches stuck in NEED_UPGRADE_REBASE |
| 3 | git-sync | Repo state Error/Unknown; CommitNotFoundError; schemas not loaded from repo |
| 4 | task-worker-pipeline | Tasks stuck RUNNING/MERGING; worker CrashLoopBackOff |
| 5 | schema-load | schema check rejects file; /api/schema/load hangs; schema-load failures |
| 6 | check-generator-transform | Pipeline check red; infrahubctl <kind> raises; Jinja2 transform fails |
| 7 | graphql-api | HTTP 5xx; non-nullable field errors; timeouts |
| 8 | performance | Slow UI, slow diff, OOM kills, browser hangs |
| 9 | auth-permissions | OAuth/OIDC login fails; default role can't create PC; JWT mismatch |
| 10 | branch-merge | Branch stuck MERGING/DELETING; failed merge leaves partial state |
Run the category-specific commands documented in
reference.md. Always pull logs from every
task-worker replica (see
multi-replica-coverage);
recent race-condition bugs hide root cause when
only one replica is sampled.
Run the deterministic flag-check catalog (see
flag-checks.md) against the collected files.
Write hits to bundle/flags.yml. Flag checks are
hints, not diagnoses (see
flag-checks-deterministic).
Apply Tier 1 auto-redaction (see
redaction-tiers). Then
print a one-screen summary: counts of replacements,
samples of distinct IPs/hostnames/customer
strings/webhook URLs. For each sample group, ask
the user keep / redact-all / case-by-case.
Apply the choices. Log every replacement to
bundle/redaction-report.txt.
Write manifest.yml and README.md. Print the
tarball command (tar czf infrahub-diagnostics-*.tgz infrahub-diagnostics-*/). The bundle is now ready
to hand to an expert.
Print the expert-ready short summary (3-5 lines).
If the user then says they also want to file a
public GitHub issue, hand off to
infrahub-reporting-issues — see
cross-link-reporting-issues.
Never duplicate that skill's routing logic here.
| Prefix | Category | Description |
|---|---|---|
| workflow | Workflow | User-gate semantics, step ordering |
| connection | Connection | URL + API token capture with privacy guarantee |
| collection | Collection | Read-only command policy |
| infrahubctl-only | Instance contract | infrahubctl-only probes against the instance |
| multi-replica | Coverage | Multi-worker log collection |
| redaction | Redaction | Two-tier secret/PII masking |
| deployment | Detection | Topology detection order |
| bundle | Bundle | On-disk bundle structure |
| manifest | Manifest | manifest.yml field contract |
| flag-checks | Flag checks | Deterministic-only hint emission |
| cross-link | Cross-linking | Hand-off to reporting-issues |
See rules/_sections.md for the index.
infrahubctl schema load, no docker compose down, no kubectl delete. Read-only only./api/... or docker compose exec into
the database / worker / message-queue containers
for state. Speculative and brittle — every such
probe couples the skill to internal implementation
details (a GraphQL field name, a Cypher procedure,
a Postgres column, an env var inside the
container) that change between minor versions.
Use infrahubctl only for instance state. See
rules/infrahubctl-only-for-instance.md.infrahubctl_state: false path exists for users
who legitimately decline; never silently assume
anonymous access.infrahub-reporting-issues. Cross-link, don't
duplicate.npx claudepluginhub opsmill/claude-marketplace --plugin infrahubReports bugs and feature requests against Infrahub-ecosystem GitHub repos by classifying, routing, and searching for duplicates. Triggers when a user wants to file an issue.
Investigates errors, failures, and unexpected behavior by gathering evidence, triaging subsystems, and producing a structured debug report. Read-only — does not modify code.
Diagnoses production incidents by detecting environment, gathering symptoms, reading logs with Grep/Bash, checking metrics, tracing requests to find root causes and propose fixes with rollbacks.