Name: awaf
Author: yogiraja

awaf — Claude Code Skill

A Claude Code skill that runs an AWAF v1.3 architectural assessment for AI agent systems.

What is AWAF?

Agent Well-Architected Framework (AWAF) is an open specification defining production-readiness criteria for AI agents. It fills the same gap for agents that AWS WAF fills for cloud infrastructure: a vendor-neutral, community-owned standard for architectural rigour.

AWAF v1.3 evaluates agents across 10 pillars in 3 tiers:

Tier	Pillars	Weight
Tier 0 — Foundation	Vertical Slice & Autonomy	Prerequisite
Tier 1 — Cloud WAF Adapted	Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability	1.0×
Tier 2 — Agent-Native	Reasoning Integrity, Controllability, Context Integrity	1.5×

Tier 2 pillars carry extra weight because they have no cloud equivalent. Servers don't hallucinate, don't need kill switches in code, and don't accumulate stale reasoning context.

Full spec: github.com/YogirajA/awaf

What's new in v1.3:

Batching criteria in Performance Efficiency, Cost Optimization, and Sustainability: tool calls and LLM API calls should be batched where possible to cut per-call overhead, latency, and cost.
Context Integrity expansions: active context-size bounding (prune, summarize, or offload before window limits), explicit state persistence for long sessions, and filtering tool responses to relevant fields before they re-enter context.
Pattern-justification advisory in Foundation: if a simpler pattern (workflow, augmented LLM, or prompt) would suffice, the assessment raises a non-scored Caution finding rather than a score penalty.
Band-based scoring: readiness is read as bands, not point estimates, because LLM assessment varies run to run.

What This Skill Does

This skill is a natural-language implementation of the AWAF spec. Unlike awaf-cli (the code-scanning reference implementation), this skill accepts any form of evidence and conducts a dialogue-driven assessment:

Source code and configuration files
Cloud provider configs (IAM policies, VPC rules, budget alerts)
Observability exports (Datadog, Grafana, CloudWatch, Honeycomb, LangSmith, Langfuse, Arize)
Eval and testing reports (LangSmith, Braintrust, Promptfoo, hallucination rate data)
Infrastructure as code (Terraform plans, CDK stacks, Helm charts)
Architecture docs (ADRs, design docs, C4 models, system diagrams)
Operational artifacts (runbooks, SLO definitions, incident postmortems)
Security reports (Snyk output, AWS Security Hub, pen test results)
CI/CD configs (GitHub Actions, GitLab CI, Jenkins)
Billing and cost data (AWS Cost Explorer, token usage reports)
Verbal or written description of how your system works

An agent with no code in the repo but verified runbooks, SLO docs, eval reports, and IAM exports can score higher than one with clean code and none of those things. Architecture is what the system does and how it is operated, not just what the code says.

Installation

Via the Claude Code VSCode extension:

Open Manage Plugins, go to the Marketplaces tab, and add:

https://github.com/YogirajA/awaf-skill

Then install the awaf plugin from the marketplace.

Via CLI:

/plugin marketplace add YogirajA/awaf-skill
/plugin install awaf@awaf-marketplace

Usage

/awaf

The skill opens by asking what evidence you can share, then:

Gathers evidence — accepts anything you provide across all evidence categories
Scores each pillar — assigns 0–100 with a confidence level (verified / partial / self_reported)
Produces a structured report — overall score, per-pillar breakdown, findings, recommendations
Requests targeted evidence — after the initial report, identifies the 2–3 gaps that would most improve score confidence and asks for them specifically
Re-scores on new evidence — when you provide more artifacts, affected pillars are re-scored and deltas are shown

Scoring

Per-pillar (0–100): each question carries a risk weight (High = 3 pts, Medium = 2 pts, Low = 1 pt):

pillar_score = (implemented_weight / total_weight) × 100

Answering "none of these apply" to any question caps that pillar at 30 and triggers an automatic High Risk flag.

Overall score applies a 1.5× multiplier to Tier 2 pillars:

overall = sum(score * (1.5 if tier == 2 else 1.0) for each pillar) /
          sum(1.5 if tier == 2 else 1.0 for each pillar)

Readiness bands:

awaf

Popularity

What's Inside

README

awaf — Claude Code Skill

What is AWAF?

What This Skill Does

Installation

Usage

Scoring

Confidence

Similar Plugins

trustabl

jeremy-vertex-validator

antigravity-bundle-agent-architect

agentic-usability

agent-lint

evaluation