AWAF skills and tools for AI agent architecture assessment
npx claudepluginhub yogiraja/awaf-skillAWAF v1.3 architectural assessment for AI agent systems
A Claude Code skill that runs an AWAF v1.0 architectural assessment for AI agent systems.
Agent Well-Architected Framework (AWAF) is an open specification defining production-readiness criteria for AI agents. It fills the same gap for agents that AWS WAF fills for cloud infrastructure: a vendor-neutral, community-owned standard for architectural rigour.
AWAF v1.0 evaluates agents across 10 pillars in 3 tiers:
| Tier | Pillars | Weight |
|---|---|---|
| Tier 0 — Foundation | Vertical Slice & Autonomy | Prerequisite |
| Tier 1 — Cloud WAF Adapted | Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability | 1.0× |
| Tier 2 — Agent-Native | Reasoning Integrity, Controllability, Context Integrity | 1.5× |
Tier 2 pillars carry extra weight because they have no cloud equivalent. Servers don't hallucinate, don't need kill switches in code, and don't accumulate stale reasoning context.
Full spec: github.com/YogirajA/awaf
This skill is a natural-language implementation of the AWAF spec. Unlike awaf-cli (the code-scanning reference implementation), this skill accepts any form of evidence and conducts a dialogue-driven assessment:
An agent with no code in the repo but verified runbooks, SLO docs, eval reports, and IAM exports can score higher than one with clean code and none of those things. Architecture is what the system does and how it is operated, not just what the code says.
Via the Claude Code VSCode extension:
Open Manage Plugins, go to the Marketplaces tab, and add:
https://github.com/YogirajA/awaf-skill
Then install the awaf plugin from the marketplace.
Via CLI:
/plugin marketplace add YogirajA/awaf-skill
/plugin install awaf@awaf-marketplace
/awaf
The skill opens by asking what evidence you can share, then:
verified / partial / self_reported)Per-pillar (0–100): each question carries a risk weight (High = 3 pts, Medium = 2 pts, Low = 1 pt):
pillar_score = (implemented_weight / total_weight) × 100
Answering "none of these apply" to any question caps that pillar at 30 and triggers an automatic High Risk flag.
Overall score applies a 1.5× multiplier to Tier 2 pillars:
overall = sum(score * (1.5 if tier == 2 else 1.0) for each pillar) /
sum(1.5 if tier == 2 else 1.0 for each pillar)
Readiness rating:
| Score | Rating | What It Means |
|---|---|---|
| 90–100 | Production Ready | Architectural patterns are sound across all pillars |
| 75–89 | Near Ready | Minor gaps, addressable before production |
| 50–74 | Needs Work | Meaningful architectural risks present |
| 25–49 | High Risk | Structural problems that will cause incidents |
| 0–24 | Not Ready | Do not ship to production |
Confidence levels:
| Level | Meaning |
|---|---|
verified | Evidence provided and assessed directly |
partial | Some evidence provided, meaningful gaps remain |
self_reported | No evidence provided; score reflects absence only |
A verified 60 is more useful than a self_reported 85. The skill always displays confidence and always explains what drove it down.
_ _ _ _ _ ___
/_\ | || || | /_\ | __|
/ _ \ | \/ \/ | / _ \ | _|
/_/ \_\ \_/\_/ /_/ \_\ |_ Agent Well-Architected Framework