From ansible-skill
Diagnoses failure modes in Ansible playbooks, roles, collections, inventory, Vault, Molecule, execution environments, and CI; ensures idempotency, blast-radius controls, secret safety, variable precedence, and validation plans.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ansible-skill:ansible-skillThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Diagnose-first guidance for Ansible and ansible-core. Core file is a workflow; depth lives in reference files loaded on demand.
Diagnose-first guidance for Ansible and ansible-core. Core file is a workflow; depth lives in reference files loaded on demand.
Every Ansible response must include:
ansible-core version, collections in requirements.yml with versions, Python interpreter target (ansible_python_interpreter), connection plugin (ssh/winrm/local), control node vs execution-environment runtime. State explicitly when the user did not provide them. When no version is given, assume the lowest supported ansible-core minor with EOL runway from the Version Matrix (currently 2.19+; 2.18 is security-only and near EOL). Recommending an EOL floor produces guidance that's already past its support window.changed=True only when the world actually changed. Name the module's idempotency contract (native module idempotent; command/shell requires creates/removes/changed_when).serial / max_fail_percentage / any_errors_fatal decision, --check + --diff coverage, whether this is safe to run against prod as-is.ansible-lint, ansible-playbook --syntax-check, --check --diff, Molecule scenario, ansible-test sanity/units/integration.state: absent), what evidence to keep (registered var output, command logs, diff artifacts).Never recommend running a play against production without --check --diff first and an explicit --limit or a reviewed inventory pattern.
ansible-core version, collections, Python interpreter, connection plugin, execution path (local/CI/EE/AWX-free), environment criticality.--check --diff / approval gates / rollback).requirements.yml, Molecule scenario, CI workflow, Vault usage.| Failure category | Symptoms | Primary references |
|---|---|---|
| Idempotency drift | Tasks report changed=True every run, command/shell without creates/changed_when, non-native modules, handlers firing spuriously | Idempotency Patterns |
| Blast radius | Missing serial, no max_fail_percentage, any_errors_fatal misused, no --limit in CI, fact-gathering against whole fleet | Execution & Runtime, CI/CD Workflows |
| Secret exposure | Plaintext in vars, no_log missing, Vault key handling, secrets in stdout/stderr, ansible-vault vs external secret managers | Security & Vault |
| Variable precedence bugs | 22-level chain surprises, set_fact vs vars vs vars_files, group_vars / host_vars collisions, extra-vars overrides | Inventory & Variables |
| Inventory correctness | Static/dynamic drift, group membership bugs, ansible_host vs inventory_hostname, missing --limit safety | Inventory & Variables |
| Handler/ordering issues | Handlers not firing on failure, meta: flush_handlers, listen topics, notify ordering | Idempotency Patterns |
| Check-mode blind spots | Tasks break under --check, ignore_errors / failed_when hiding real failures, modules that don't support check mode | Idempotency Patterns |
| Collection/role supply chain | Galaxy pinning, requirements.yml hygiene, version drift, private Automation Hub, signature verification | Collections & Supply Chain |
| Execution environment / runtime | EE image pinning, ansible-navigator vs ansible-playbook, Python interpreter discovery, connection plugin, become escalation, forks/pipelining/fact caching | Execution & Runtime |
Activate when: creating or reviewing Ansible playbooks, roles, or collections; setting up or debugging Molecule / ansible-test; structuring multi-environment inventory; implementing Ansible CI/CD; choosing role patterns or collection organization; configuring Vault or external secret backends; building or pinning execution environments.
Don't use for: basic YAML syntax Claude already knows; module API reference (point users at the ansible-doc CLI or docs.ansible.com); AAP / AWX / Tower platform-specific questions (job templates, surveys, RBAC, workflows); cloud-provider SDK questions unrelated to Ansible modules.
| Unit | When to Use | Scope |
|---|---|---|
| Task | One action | Install a package, write a file |
| Role | Reusable bundle of related tasks | Web server config, database setup |
| Playbook | Orchestrates roles across hosts | Full stack deploy, one environment |
| Collection | Distributable unit of roles, modules, plugins | Shared across teams, versioned, on Galaxy or Automation Hub |
Flow: task → role → playbook → collection.
inventories/
prod/ hosts, group_vars/, host_vars/
staging/ hosts, group_vars/, host_vars/
dev/
roles/ # local reusable roles
collections/ # requirements.yml, installed collections
playbooks/ # deploy.yml, site.yml, one-off ops plays
molecule/ # per-role scenarios
Separate inventories from roles. Keep roles single-responsibility. Keep all group_vars/ and host_vars/ inside each inventories/<env>/ directory — never at repo root. Repo-root group_vars//host_vars/ apply to every environment's plays and leak prod values into dev runs.
nginx_site, not my_role). Hyphens fail the ansible-lint role-name rule (^[a-z][a-z0-9_]*$) and break collection packagingnginx_site_port, not port)"Install nginx", not "nginx")tags: [config, tls] vs tags: [install_nginx])name → module → module args → register → when → loop → notify → tags
- name: Install nginx
ansible.builtin.package:
name: nginx
state: present
register: nginx_install
when: ansible_os_family == 'Debian'
notify: restart nginx
tags: [install]
| Situation | Use | Why |
|---|---|---|
| Native module available | ansible.builtin.<module> / fqcn module | Idempotent by contract |
| Must shell out, stateful output file | ansible.builtin.command with creates: / removes: | changed=True only if file missing |
| Must shell out, no clear file marker | ansible.builtin.command with changed_when: based on stdout/rc | Explicit change detection |
| Must shell out, needs shell features (pipes, redirects) | ansible.builtin.shell + changed_when | Last resort — harder to make safe |
Never: run ansible.builtin.shell without changed_when unless the task is genuinely informational and you also set changed_when: false.
See Idempotency Patterns for module idempotency contracts, handler patterns, and check-mode coverage.
| Target fleet size | Pattern | Play config |
|---|---|---|
| 1–5 hosts | Serial small batches | serial: 1 or serial: [1, 2] |
| 10–50 hosts, canary first | Canary + rolling | serial: [1, "25%"] with max_fail_percentage: 10 |
| 50+ hosts | Rolling with fail cap | serial: "10%", max_fail_percentage: 5 |
| Critical state-change on many hosts | Fail fast | any_errors_fatal: true |
| Independent tasks, no cascade | Free strategy | strategy: free (hosts run independently) |
Never run against production without an explicit --limit or a reviewed inventory pattern. Never set any_errors_fatal: true on rolling deploys where partial completion is safe — it turns a contained mid-batch failure into a global abort. Use any_errors_fatal: true precisely when partial completion is worse than full failure (schema migrations, bootstrap, replicated state); otherwise pair serial: with max_fail_percentage: and let the rollout continue.
See Execution & Runtime for serial/max-fail combinations and CI/CD Workflows for CI-level blast-radius gates.
Abbreviated precedence (lowest → highest):
roles/<role>/defaults/main.yml)group_vars/allgroup_vars/<group>host_vars/<host>set_fact--extra-vars (always wins)Most common bugs:
set_fact values persist across plays in the same run — unexpected when debugginggroup_vars/all silently overridden by host_vars/<host> — confusing when the same host appears in multiple groups--extra-vars with @file.yml beats everything — a stray flag in CI can override protected configSee Inventory & Variables for the full 22-level ladder and collision examples.
| Situation | Approach | Tools | Cost |
|---|---|---|---|
| Syntax check | Static | ansible-playbook --syntax-check, ansible-lint | Free |
| Role unit test | Scenario-based | Molecule + docker/podman driver | Free–Low |
| Collection unit test | Module/plugin tests | ansible-test units | Free |
| Collection sanity | Import + schema | ansible-test sanity | Free |
| Integration — role | Molecule against real target | Molecule + delegated/vagrant driver | Med |
| Integration — collection | Live-run modules | ansible-test integration | Med |
| End-to-end, multi-host | Staged apply | --check --diff against staging | High |
Rules:
ansible-lint — it catches fqcn, no_log, changed_when misuse at PR time.--check --diff as the last gate before any production play.ansible-test for collections. They're not interchangeable.See Testing Frameworks for Molecule scenario structure, ansible-test usage, and argument-specs for role input validation.
Pipeline stages: lint → syntax-check → Molecule (or ansible-test) → staged --check --diff → gated apply.
Rules:
ansible-core to a currently supported minor in CI (e.g. ansible-core>=2.20,<2.21); cross-check the Version Matrix before each release cycle and bump off any minor that is past EOL or inside the final security-only window.requirements.yml with exact versions for prod branches.--check --diff output as an approval artifact, not a replayable plan. The apply job must re-evaluate current state — optionally re-run --check --diff against live infrastructure and compare to the approved artifact before executing.See CI/CD Workflows for GitHub Actions + GitLab CI templates and blast-radius approval gates.
Don't:
vars_filesno_log: true on tasks that pass secrets as module args--verbose in CI on tasks handling secrets (output leaks to logs)Do:
ansible-vault encrypt_string for inline single-value secretsno_log: true on any task whose module args include secrets, and avoid leak paths that bypass it: secrets in task names, debug: output of registered values, related tasks that omit no_log, and any loop: whose item itself is a secret (mark the looping task no_log: true too)See Security & Vault for vault-id patterns, external-backend lookups, and secrets-in-logs hardening.
| Pin strategy | Prod | Dev |
|---|---|---|
requirements.yml collection version | Exact (version: "5.1.2") | Range (version: ">=5.1.0,<6.0.0") |
| Galaxy vs Automation Hub | Automation Hub for certified | Galaxy OK for experimental |
| Signature verification | Required | Optional for dev |
Rules:
Use fully-qualified collection names (ansible.builtin.copy, not copy) — ansible-lint fqcn rule enforces.
Pin all collections in requirements.yml; do not rely on the ansible community package version for prod.
Mirror critical collections internally (private Automation Hub or git) for supply-chain control.
Verify signatures on Automation Hub installs with the canonical flags (these are the ones that actually exist):
ansible-galaxy collection verify <coll> \
--server automation_hub \
--keyring /etc/pki/ansible/automation-hub-signing.gpg \
--collections-path ./collections \
--required-valid-signature-count +1
<coll> is required (the FQCN of an installed collection). To verify everything pinned in a manifest, swap the positional for -r requirements.yml. The --collections-path must match the --collections-path used at install time, otherwise verify searches Ansible's default paths and may report "not installed" or check a different copy.
The + prefix on --required-valid-signature-count is what makes verification strict: with +1 (or +all), the absence of any signature is itself an error, so an unsigned or misconfigured collection fails CI. Without the +, a bare 1 only checks "≥1 valid signature when signatures are present" and silently passes a zero-signature response. Pass --server automation_hub (matching a [galaxy_server.automation_hub] block in ansible.cfg) so the verify lookup hits the same registry the collection was installed from. In ansible.cfg, the GPG keyring config key is [galaxy] gpg_keyring. Flags like --signature-count-threshold or config keys like signing_keys do not exist.
Run both install commands separately into project-local paths so CI and the documented layout stay in sync — collection install -r ignores the roles: section, and role install falls back to Ansible's user roles path unless --roles-path is set:
ansible-galaxy collection install -r requirements.yml --collections-path ./collections
ansible-galaxy role install -r requirements.yml --roles-path ./roles
See Collections & Supply Chain for requirements.yml syntax, signature verification, and private-hub auth.
| Situation | Use | Why |
|---|---|---|
| Local dev, fast iteration | Bare ansible-playbook + venv | No image build overhead |
| CI reproducibility | EE image with ansible-navigator | Pinned ansible-core + collections + deps |
| Production run | EE image, pulled by digest | Deterministic runs, rollback by switching to a previously approved digest |
Rules:
@sha256:...), not tag, for production.ansible-builder from an execution-environment.yml.ansible-navigator run is the preferred invocation — it handles EE lifecycle + streams output cleanly.See Execution & Runtime for EE build patterns, interpreter discovery, connection/become gotchas, and forks/pipelining/fact-caching.
| Component | Strategy | Example |
|---|---|---|
ansible-core runtime | Pin minor for prod; pick a non-EOL minor from the Version Matrix | ansible-core>=2.20,<2.21 (re-evaluate at each release cycle) |
Community ansible package | Pin exact or avoid in prod | Prefer pinning ansible-core + collections separately |
| Collections (prod) | Exact version in requirements.yml | version: "5.1.2" |
| Collections (dev) | Allow minor | version: ">=5.1.0,<6.0.0" |
| Python interpreter | Explicit ansible_python_interpreter | /usr/bin/python3 (avoid auto-discovery in prod) |
Keep ansible-core + collection upgrades in a separate PR from functional changes. The community ansible package is a starter bundle; production teams pin collections individually.
| Feature | Min ansible-core | Common use |
|---|---|---|
validate: parameter on template/copy | 2.0+ | Run shell validator before applying |
import_role / include_role with vars_from | 2.4+ | Load alt var files per import |
argument_specs in meta/ | 2.11+ | Role input validation |
Role handler listen topic inheritance | 2.16+ | Cross-role handler topics |
Structured changed_when with dict returns | 2.17+ | Clean branching on module result |
ansible-navigator stable workflow | n/a (packaging) | Default for EE-based runs |
Verify the runtime floor before emitting a feature. Version-specific behavior (esp. validate and handler inheritance) is a frequent LLM mistake.
Progressive disclosure — essentials here, depth on demand.
command/shell guards, handlers, check-modeset_fact vs varsno_log, log hardeningrequirements.yml, fqcn, signature verification, private hubsProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub olandodeflexy/ansible-skill --plugin ansible-skill