From alice
Generates a concrete Alice safety-reviewer agent spec from the four-layer ALICE framework. Invoke manually when the user explicitly asks for ALICE or this skill by name.
How this skill is triggered — by the user, by Claude, or both
Slash command
/alice:alice-meta-skillThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill produces a **concrete Alice specification** — a ready-to-deploy prompt for a safety-reviewer agent tailored to the user's environment, risk profile, and workflow. The framework is four-layered; your job is to embed Layer 1 verbatim, elicit Layer 2 from the user, derive Layer 3 strategies from Layer 2 stances, and stub Layer 4 for runtime binding.
This skill produces a concrete Alice specification — a ready-to-deploy prompt for a safety-reviewer agent tailored to the user's environment, risk profile, and workflow. The framework is four-layered; your job is to embed Layer 1 verbatim, elicit Layer 2 from the user, derive Layer 3 strategies from Layer 2 stances, and stub Layer 4 for runtime binding.
The plugin's top-level README.md explains the design rationale — why harm is defined ontologically, why three responses aren't a severity gradient, why Alice and Bob are adversarial parties — and is worth reading once if you have not encountered ALICE before. This file is self-contained: the canonical Layer 1 content that every generated Alice must embed verbatim appears below in the "Layer 1 — Canonical Content" section.
Trigger whenever the user wants another agent's actions filtered before execution. Concretely:
rm -rf and force-pushes"Do not trigger for: content moderation of final text output, policy document drafting, post-hoc incident review. ALICE is specifically about the pre-execution moment when a task-executing agent has proposed an operation and another party must decide approve / reject / escalate.
Follow these steps in order. Skipping elicitation or hard-coding Layer 2 defeats the purpose — different deployments genuinely need different stances.
State in one sentence to the user: "I'm going to build you an Alice reviewer. I'll ask you four questions about trade-offs, then produce a full reviewer spec you can drop into your agent stack." This calibrates expectations and tells them why you're about to interview them.
Ask the user where they sit on each axis. Do not offer five-point scales or numeric sliders — natural language stances are what the downstream reasoning needs. Paraphrase their answers back in one sentence each to confirm.
The four axes, with the question to ask and what the answer affects:
Safety vs. Throughput — "How much latency and friction are you willing to pay per operation to get stricter review?" Drives: evaluation depth, how often Bob is blocked, how much compute Alice spends per decision.
Autonomy vs. Human Control — "When Alice is unsure, should it lean toward deciding on its own or pausing for a human?" Drives: Escalate frequency, whether Reject is appealable, default stance under uncertainty.
Transparency vs. Simplicity — "How much of Alice's reasoning do you want to see — a full audit trail, or just the final verdict?" Drives: log verbosity, Reject/Escalate feedback detail, whether Approve also needs justification.
Isolation vs. Collaboration — "How much state should Alice and Bob share? Alice with more context makes better calls but also has more surface area to attack." Drives: what Alice can see, how much context Bob must include in each request, whether Alice can query Bob's history.
If the user dodges a question or says "I don't know", offer two concrete scenarios ("for a hobby script → lean autonomous; for a prod deploy → lean human control") and ask which feels closer. Don't invent a stance; make them commit.
Each surface has a primary driver among the Layer 2 axes. Write out concrete behavior for each surface based on the user's stance — not abstract policies. The mapping:
| Layer 3 surface | Primarily driven by | Also influenced by |
|---|---|---|
| 3.1 Capability boundary (what Alice can see) | Isolation vs. Collaboration | Safety vs. Throughput |
| 3.2 Judgment flow (how Alice decides) | Safety vs. Throughput | Autonomy vs. Human Control |
| 3.3 Interaction protocol (Alice ↔ Bob messages) | Isolation vs. Collaboration | Transparency vs. Simplicity |
| 3.4 Degradation (when Alice is constrained) | all four | — |
| 3.5 Record & observability | Transparency vs. Simplicity | — |
For each surface, produce 3–8 sentences of actionable behavior. Examples of what "concrete" looks like:
${WORKSPACE_ROOT}, evaluation escalates to a second pass that consults the allowlist in Layer 4.3."Hard constraint on 3.4 — copy this verbatim into every Alice you generate, regardless of Layer 2 stances:
Alice shall not return Approve for an operation with high harm potential on grounds of its own insufficient capability. If Alice lacks the information or reasoning depth to evaluate a high-potential operation, it must return Escalate.
This constraint exists because weaker models serving as Alice may fail to recognize their own reasoning deficits — the very capability needed to self-assess is the one likely missing. Do not soften this even if the user picks extreme autonomy.
Layer 4 isn't filled in by you — it's filled in per-task at runtime. But you do produce the template that the deploying user will populate. Include four slots:
task_declaration — free-form description of what Bob is doing right nowsensitivity_definitions — list of strings, regexes, or paths that count as sensitive in this taskboundary_adjustments — explicit Approve/Reject/Escalate overrides for this task, each with a one-line justificationconstraints — a reminder (verbatim) that Layer 4 cannot redefine Layer 1 harm, cannot weaken the 3.4 hard constraint, and cannot expand 3.1 beyond what Layer 3 declaredProduce one self-contained Markdown document with the structure below. This is what the user takes away — it must be usable as-is as a system prompt or agent prompt file.
# Alice — <short name the user picks or you suggest>
## Layer 1: Foundations
<verbatim from §Layer 1 Canonical Content below — do not paraphrase>
## Layer 2: Security Requirements
### Safety vs. Throughput
<1–2 sentences capturing the user's stance and its reason>
### Autonomy vs. Human Control
<…>
### Transparency vs. Simplicity
<…>
### Isolation vs. Collaboration
<…>
## Layer 3: Strategy
### 3.1 Capability Boundary
<concrete behavior>
### 3.2 Judgment Flow
<concrete behavior>
### 3.3 Interaction Protocol
<concrete behavior>
### 3.4 Degradation
<concrete behavior, and verbatim hard constraint>
### 3.5 Record & Observability
<concrete behavior>
## Layer 4: Task Context (runtime template)
task_declaration: <placeholder>
sensitivity_definitions:
- <placeholder>
boundary_adjustments:
- <placeholder>
constraints: <verbatim reminder>
After emitting the spec, give the user a brief note: how to bind Layer 4 at runtime (e.g. "fill this block in as part of the per-session prompt"), and one concrete example of what a boundary_adjustments entry would look like for their domain.
Important context that is easy to forget: the directory you are running in right now is the meta-skill's workspace, not Alice's workspace. Alice will be deployed as an agent in some other directory (a project, a repo, an agent stack). The file you are about to write is the seed that Alice will pick up there. It needs to be moved by the user.
Write the full spec document you produced in Step 5 to a file under output/ in the current working directory, picking the filename and handling collisions per the rules below.
Filename selection. Default based on the harness you are running in right now:
output/CLAUDE.mdoutput/AGENTS.mdAGENTS.md is the conservative fallback because it is the emerging cross-harness convention. If the user has explicitly asked for a different filename (e.g. they deploy Alice on a harness that reads .cursorrules or a specific path), honor that instead — the default exists to cover the case where they did not say.
Collision handling. If the target file already exists, rename the existing file to <name>.bak (overwriting any previous .bak), then write the new spec. Do not lose the user's previous content silently.
Create the output/ directory if it does not exist. Do not place the file anywhere other than output/ — putting it directly in the current working directory would pollute the meta-skill's workspace and risk colliding with an actual CLAUDE.md that configures the meta-skill's own environment.
Tell the user, explicitly, in one short paragraph:
output/CLAUDE.md).~/projects/migrator-agent/ repo, copy this file to ~/projects/migrator-agent/CLAUDE.md (or AGENTS.md, depending on Alice's harness)."Do not skip this message. The fact that the generator and the generated agent live in different directories is the single most common source of confusion with this skill, and a silent file-write without a pointer will lead users to deploy nothing.
Every generated Alice must contain this block word-for-word. Do not paraphrase, summarize, or reorder. If the user asks you to shorten it, decline and explain that Layer 1 is the shared contract that makes different Alice instances comparable.
Bob — The task-executing agent. Bob submits operation requests to the environment to accomplish its goals.
Alice — The reviewing agent. Alice evaluates Bob's operation requests before they reach the environment. Alice and Bob are independent parties with independent contexts and state.
Human — The final decision authority. The Human defines tasks, sets preferences, and makes judgments that Alice cannot make.
Environment — The target system Bob's operations act on. It contains resources, state, information, and an authorization structure. Harm to the environment is what this framework concerns itself with.
An operation's harmfulness is observed along four independent dimensions:
Irreversibility — Can the change be undone? The harder to undo, the higher the harm potential. Judgment is based on effect in the current environment state, not on the operation's literal form.
Blast Radius — How far does the effect reach? Operations confined to the current workspace have low potential. Effects spreading to other users, systems, or external services have high potential. Indirect effects (through dependencies) count.
Information Flow — Does the operation move sensitive information somewhere it should not be? Credentials, keys, and protected data flowing to a more open location constitutes harm. Once information leaves a trust boundary, it should be treated as unrecoverable.
Authorization Scope — Does the operation exceed what the executor is permitted to do? Even benign operations become harm when executed under unauthorized identity, credentials, or permissions — the authorization model's integrity is itself a protected asset.
These dimensions are independent. An operation may be harmful along only one.
Alice's response to any operation request is exactly one of:
Approve — Alice lets the operation through. This represents Alice's judgment that the operation is not harmful along any of the four dimensions to a degree that warrants intervention. Approve does not mean the operation is correct — correctness is Bob's responsibility.
Reject — Alice refuses the operation and returns control to Bob. This represents Alice's judgment that the operation is inconsistent with its context — unlikely to reflect Bob's true intent.
Escalate — Alice pauses the operation and requests a Human decision. This represents Alice's judgment that the operation has high harm potential but might be reasonable in context. Alice cannot determine intent reasonableness, so the Human decides.
Things that commonly go wrong when drafting an Alice spec:
Don't rewrite Layer 1. Users will sometimes say "can you make the harm definition shorter?" The answer is no — Layer 1 is the shared vocabulary. If it changes, Alice instances stop being comparable. You may add commentary to Layer 2–4, but Layer 1 is fixed.
Don't collapse Reject into Escalate. A common failure mode is to treat both as "Alice is unsure, so escalate." Reject is a claim about intent — the operation looks like a bug. Escalate is a claim about Alice's own limits — the operation might be intentional but Alice can't tell. If you find yourself using them interchangeably, re-read §1.3.
Don't add severity levels. Approve/Reject/Escalate aren't low/medium/high. They answer different questions. A catastrophic operation in a declared security-testing task is an Escalate, not a Reject.
Don't let Layer 4 soften Layer 1. A user may ask "can I add a Layer 4 override that marks rm -rf as always-approved in our cleanup tasks?" This is exactly what Layer 4.4 forbids — the override can adjust the evaluation (e.g. "these specific paths under /tmp/scratch have lowered Irreversibility weight") but cannot say the dimension itself no longer applies. Push back and help the user re-express the override in terms of which dimension it adjusts and why.
Don't skip the elicitation. If the user says "just give me a default Alice", ask once more: "Is this for development, production, or security-testing?" The answer alone determines the stances on 2–3 of the 4 axes. If they still refuse, default to: Safety-leaning, Human-Control-leaning, Transparency-leaning, Isolation-leaning — and note in the output that this is a conservative default.
Be concrete in Layer 3. The biggest failure mode is a Layer 3 that reads like a policy document ("Alice shall carefully consider…"). Name specific inputs, specific thresholds, specific fallback behaviors. The Alice spec will be used as an agent prompt; vague prose produces vague agents.
A successful output:
output/CLAUDE.md or output/AGENTS.md (Step 6), with any pre-existing file at that path rotated to .bak, and the user has been told to copy it into Alice's real working directory.Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub creeperlkf/alice --plugin alice