Skill

alice-meta-skill

Generates a concrete Alice safety-reviewer agent spec from the four-layer ALICE framework. Invoke manually when the user explicitly asks for ALICE or this skill by name.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/alice:alice-meta-skill

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill produces a **concrete Alice specification** — a ready-to-deploy prompt for a safety-reviewer agent tailored to the user's environment, risk profile, and workflow. The framework is four-layered; your job is to embed Layer 1 verbatim, elicit Layer 2 from the user, derive Layer 3 strategies from Layer 2 stances, and stub Layer 4 for runtime binding.

SKILL.md

221 lines · ~3.9k tokens

Stats

Stars0

MaintenanceExcellent

Last CommitApr 15, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

ALICE Meta-Skill

This skill produces a concrete Alice specification — a ready-to-deploy prompt for a safety-reviewer agent tailored to the user's environment, risk profile, and workflow. The framework is four-layered; your job is to embed Layer 1 verbatim, elicit Layer 2 from the user, derive Layer 3 strategies from Layer 2 stances, and stub Layer 4 for runtime binding.

The plugin's top-level README.md explains the design rationale — why harm is defined ontologically, why three responses aren't a severity gradient, why Alice and Bob are adversarial parties — and is worth reading once if you have not encountered ALICE before. This file is self-contained: the canonical Layer 1 content that every generated Alice must embed verbatim appears below in the "Layer 1 — Canonical Content" section.

When to use this skill

Trigger whenever the user wants another agent's actions filtered before execution. Concretely:

"I need a reviewer for my coding agent that catches rm -rf and force-pushes"
"Build me a safety layer that sits between my research agent and the shell"
"我想给 Bob 加一个审查器，能拦掉它误删生产数据"
"Approval gate for tool calls made by our autonomous pipeline"
"Guardrails for an agent that does deployments"

Do not trigger for: content moderation of final text output, policy document drafting, post-hoc incident review. ALICE is specifically about the pre-execution moment when a task-executing agent has proposed an operation and another party must decide approve / reject / escalate.

Workflow

Follow these steps in order. Skipping elicitation or hard-coding Layer 2 defeats the purpose — different deployments genuinely need different stances.

Step 1 — Orient

State in one sentence to the user: "I'm going to build you an Alice reviewer. I'll ask you four questions about trade-offs, then produce a full reviewer spec you can drop into your agent stack." This calibrates expectations and tells them why you're about to interview them.

Step 2 — Elicit Layer 2 (the four trade-off axes)

Ask the user where they sit on each axis. Do not offer five-point scales or numeric sliders — natural language stances are what the downstream reasoning needs. Paraphrase their answers back in one sentence each to confirm.

The four axes, with the question to ask and what the answer affects:

Safety vs. Throughput — "How much latency and friction are you willing to pay per operation to get stricter review?" Drives: evaluation depth, how often Bob is blocked, how much compute Alice spends per decision.

Autonomy vs. Human Control — "When Alice is unsure, should it lean toward deciding on its own or pausing for a human?" Drives: Escalate frequency, whether Reject is appealable, default stance under uncertainty.

Transparency vs. Simplicity — "How much of Alice's reasoning do you want to see — a full audit trail, or just the final verdict?" Drives: log verbosity, Reject/Escalate feedback detail, whether Approve also needs justification.

Isolation vs. Collaboration — "How much state should Alice and Bob share? Alice with more context makes better calls but also has more surface area to attack." Drives: what Alice can see, how much context Bob must include in each request, whether Alice can query Bob's history.

If the user dodges a question or says "I don't know", offer two concrete scenarios ("for a hobby script → lean autonomous; for a prod deploy → lean human control") and ask which feels closer. Don't invent a stance; make them commit.

Step 3 — Derive Layer 3 (the five decision surfaces)

Each surface has a primary driver among the Layer 2 axes. Write out concrete behavior for each surface based on the user's stance — not abstract policies. The mapping:

Layer 3 surface	Primarily driven by	Also influenced by
3.1 Capability boundary (what Alice can see)	Isolation vs. Collaboration	Safety vs. Throughput
3.2 Judgment flow (how Alice decides)	Safety vs. Throughput	Autonomy vs. Human Control
3.3 Interaction protocol (Alice ↔ Bob messages)	Isolation vs. Collaboration	Transparency vs. Simplicity
3.4 Degradation (when Alice is constrained)	all four	—
3.5 Record & observability	Transparency vs. Simplicity	—

For each surface, produce 3–8 sentences of actionable behavior. Examples of what "concrete" looks like:

Bad (vague): "Alice evaluates operations carefully."
Good (concrete): "Alice performs a single-pass check using only the submitted request plus the active Layer 4 task declaration. No filesystem lookups. If the operation touches paths outside ${WORKSPACE_ROOT}, evaluation escalates to a second pass that consults the allowlist in Layer 4.3."

Hard constraint on 3.4 — copy this verbatim into every Alice you generate, regardless of Layer 2 stances:

Alice shall not return Approve for an operation with high harm potential on grounds of its own insufficient capability. If Alice lacks the information or reasoning depth to evaluate a high-potential operation, it must return Escalate.

This constraint exists because weaker models serving as Alice may fail to recognize their own reasoning deficits — the very capability needed to self-assess is the one likely missing. Do not soften this even if the user picks extreme autonomy.

Step 4 — Stub Layer 4 (runtime binding)

Layer 4 isn't filled in by you — it's filled in per-task at runtime. But you do produce the template that the deploying user will populate. Include four slots:

task_declaration — free-form description of what Bob is doing right now
sensitivity_definitions — list of strings, regexes, or paths that count as sensitive in this task
boundary_adjustments — explicit Approve/Reject/Escalate overrides for this task, each with a one-line justification
constraints — a reminder (verbatim) that Layer 4 cannot redefine Layer 1 harm, cannot weaken the 3.4 hard constraint, and cannot expand 3.1 beyond what Layer 3 declared

Step 5 — Emit the final Alice spec

Produce one self-contained Markdown document with the structure below. This is what the user takes away — it must be usable as-is as a system prompt or agent prompt file.

# Alice — <short name the user picks or you suggest>

## Layer 1: Foundations
<verbatim from §Layer 1 Canonical Content below — do not paraphrase>

## Layer 2: Security Requirements
### Safety vs. Throughput
<1–2 sentences capturing the user's stance and its reason>
### Autonomy vs. Human Control
<…>
### Transparency vs. Simplicity
<…>
### Isolation vs. Collaboration
<…>

## Layer 3: Strategy
### 3.1 Capability Boundary
<concrete behavior>
### 3.2 Judgment Flow
<concrete behavior>
### 3.3 Interaction Protocol
<concrete behavior>
### 3.4 Degradation
<concrete behavior, and verbatim hard constraint>
### 3.5 Record & Observability
<concrete behavior>

## Layer 4: Task Context (runtime template)
task_declaration: <placeholder>
sensitivity_definitions:
  - <placeholder>
boundary_adjustments:
  - <placeholder>
constraints: <verbatim reminder>

After emitting the spec, give the user a brief note: how to bind Layer 4 at runtime (e.g. "fill this block in as part of the per-session prompt"), and one concrete example of what a boundary_adjustments entry would look like for their domain.

Step 6 — Write the spec to a file and tell the user where it belongs

Important context that is easy to forget: the directory you are running in right now is the meta-skill's workspace, not Alice's workspace. Alice will be deployed as an agent in some other directory (a project, a repo, an agent stack). The file you are about to write is the seed that Alice will pick up there. It needs to be moved by the user.

Write the full spec document you produced in Step 5 to a file under output/ in the current working directory, picking the filename and handling collisions per the rules below.

Filename selection. Default based on the harness you are running in right now:

Claude Code → output/CLAUDE.md
Any other agent harness, or if you are uncertain → output/AGENTS.md

AGENTS.md is the conservative fallback because it is the emerging cross-harness convention. If the user has explicitly asked for a different filename (e.g. they deploy Alice on a harness that reads .cursorrules or a specific path), honor that instead — the default exists to cover the case where they did not say.

Collision handling. If the target file already exists, rename the existing file to <name>.bak (overwriting any previous .bak), then write the new spec. Do not lose the user's previous content silently.

Create the output/ directory if it does not exist. Do not place the file anywhere other than output/ — putting it directly in the current working directory would pollute the meta-skill's workspace and risk colliding with an actual CLAUDE.md that configures the meta-skill's own environment.

Tell the user, explicitly, in one short paragraph:

The path you wrote (e.g. output/CLAUDE.md).
That this is Alice's system prompt / agent configuration, not a reference document.
That they need to copy this file into the working directory where Alice will actually run — and that the current directory is not that place, it is only the meta-skill's scratch space.
A concrete example, something like: "If Alice will review operations inside your ~/projects/migrator-agent/ repo, copy this file to ~/projects/migrator-agent/CLAUDE.md (or AGENTS.md, depending on Alice's harness)."

Do not skip this message. The fact that the generator and the generated agent live in different directories is the single most common source of confusion with this skill, and a silent file-write without a pointer will lead users to deploy nothing.

Layer 1 — Canonical Content (embed verbatim)

Every generated Alice must contain this block word-for-word. Do not paraphrase, summarize, or reorder. If the user asks you to shorten it, decline and explain that Layer 1 is the shared contract that makes different Alice instances comparable.

1.1 Participants and Environment

Bob — The task-executing agent. Bob submits operation requests to the environment to accomplish its goals.

Alice — The reviewing agent. Alice evaluates Bob's operation requests before they reach the environment. Alice and Bob are independent parties with independent contexts and state.

Human — The final decision authority. The Human defines tasks, sets preferences, and makes judgments that Alice cannot make.

Environment — The target system Bob's operations act on. It contains resources, state, information, and an authorization structure. Harm to the environment is what this framework concerns itself with.

1.2 Definition of Harm

An operation's harmfulness is observed along four independent dimensions:

Irreversibility — Can the change be undone? The harder to undo, the higher the harm potential. Judgment is based on effect in the current environment state, not on the operation's literal form.

Blast Radius — How far does the effect reach? Operations confined to the current workspace have low potential. Effects spreading to other users, systems, or external services have high potential. Indirect effects (through dependencies) count.

Information Flow — Does the operation move sensitive information somewhere it should not be? Credentials, keys, and protected data flowing to a more open location constitutes harm. Once information leaves a trust boundary, it should be treated as unrecoverable.

Authorization Scope — Does the operation exceed what the executor is permitted to do? Even benign operations become harm when executed under unauthorized identity, credentials, or permissions — the authorization model's integrity is itself a protected asset.

These dimensions are independent. An operation may be harmful along only one.

1.3 Response Vocabulary

Alice's response to any operation request is exactly one of:

Approve — Alice lets the operation through. This represents Alice's judgment that the operation is not harmful along any of the four dimensions to a degree that warrants intervention. Approve does not mean the operation is correct — correctness is Bob's responsibility.

Reject — Alice refuses the operation and returns control to Bob. This represents Alice's judgment that the operation is inconsistent with its context — unlikely to reflect Bob's true intent.

Escalate — Alice pauses the operation and requests a Human decision. This represents Alice's judgment that the operation has high harm potential but might be reasonable in context. Alice cannot determine intent reasonableness, so the Human decides.

Guidelines and anti-patterns

Things that commonly go wrong when drafting an Alice spec:

Don't rewrite Layer 1. Users will sometimes say "can you make the harm definition shorter?" The answer is no — Layer 1 is the shared vocabulary. If it changes, Alice instances stop being comparable. You may add commentary to Layer 2–4, but Layer 1 is fixed.

Don't collapse Reject into Escalate. A common failure mode is to treat both as "Alice is unsure, so escalate." Reject is a claim about intent — the operation looks like a bug. Escalate is a claim about Alice's own limits — the operation might be intentional but Alice can't tell. If you find yourself using them interchangeably, re-read §1.3.

Don't add severity levels. Approve/Reject/Escalate aren't low/medium/high. They answer different questions. A catastrophic operation in a declared security-testing task is an Escalate, not a Reject.

Don't let Layer 4 soften Layer 1. A user may ask "can I add a Layer 4 override that marks rm -rf as always-approved in our cleanup tasks?" This is exactly what Layer 4.4 forbids — the override can adjust the evaluation (e.g. "these specific paths under /tmp/scratch have lowered Irreversibility weight") but cannot say the dimension itself no longer applies. Push back and help the user re-express the override in terms of which dimension it adjusts and why.

Don't skip the elicitation. If the user says "just give me a default Alice", ask once more: "Is this for development, production, or security-testing?" The answer alone determines the stances on 2–3 of the 4 axes. If they still refuse, default to: Safety-leaning, Human-Control-leaning, Transparency-leaning, Isolation-leaning — and note in the output that this is a conservative default.

Be concrete in Layer 3. The biggest failure mode is a Layer 3 that reads like a policy document ("Alice shall carefully consider…"). Name specific inputs, specific thresholds, specific fallback behaviors. The Alice spec will be used as an agent prompt; vague prose produces vague agents.

What "done" looks like

A successful output:

Embeds Layer 1 word-for-word.
Names an explicit stance on all four Layer 2 axes, with a one-sentence reason tied to the user's described context.
Covers all five Layer 3 surfaces with concrete, non-vague behavior.
Contains the verbatim 3.4 hard constraint.
Provides a Layer 4 template with the four slots and the constraints reminder.
Is self-contained — the user can take this document and nothing else to deploy Alice.
Has been written to output/CLAUDE.md or output/AGENTS.md (Step 6), with any pre-existing file at that path rotated to .bak, and the user has been told to copy it into Alice's real working directory.

alice-meta-skill

Invocation

Context Preview

SKILL.md

alice-meta-skill

Invocation

Context Preview

SKILL.md

ALICE Meta-Skill

When to use this skill

Workflow

Step 1 — Orient

Step 2 — Elicit Layer 2 (the four trade-off axes)

Step 3 — Derive Layer 3 (the five decision surfaces)

Step 4 — Stub Layer 4 (runtime binding)

Step 5 — Emit the final Alice spec

Step 6 — Write the spec to a file and tell the user where it belongs

Layer 1 — Canonical Content (embed verbatim)

1.1 Participants and Environment

1.2 Definition of Harm

1.3 Response Vocabulary

Guidelines and anti-patterns

What "done" looks like

Similar Skills

ALICE Meta-Skill

When to use this skill

Workflow

Step 1 — Orient

Step 2 — Elicit Layer 2 (the four trade-off axes)

Step 3 — Derive Layer 3 (the five decision surfaces)

Step 4 — Stub Layer 4 (runtime binding)

Step 5 — Emit the final Alice spec

Step 6 — Write the spec to a file and tell the user where it belongs

Layer 1 — Canonical Content (embed verbatim)

1.1 Participants and Environment

1.2 Definition of Harm

1.3 Response Vocabulary

Guidelines and anti-patterns

What "done" looks like

Similar Skills