Marketplace

awaf-marketplace

AWAF skills and tools for AI agent architecture assessment

npx claudepluginhub yogiraja/awaf-skill

README

View full README on GitHub

1 Plugin

awaf

2·

AWAF v1.3 architectural assessment for AI agent systems

v1.1.0

YogirajA

Stats

Plugins1

Stars2

UpdatedMay 30, 2026

Links

View on GitHub View Marketplace JSON

awaf — Claude Code Skill

A Claude Code skill that runs an AWAF v1.0 architectural assessment for AI agent systems.

What is AWAF?

Agent Well-Architected Framework (AWAF) is an open specification defining production-readiness criteria for AI agents. It fills the same gap for agents that AWS WAF fills for cloud infrastructure: a vendor-neutral, community-owned standard for architectural rigour.

AWAF v1.0 evaluates agents across 10 pillars in 3 tiers:

Tier	Pillars	Weight
Tier 0 — Foundation	Vertical Slice & Autonomy	Prerequisite
Tier 1 — Cloud WAF Adapted	Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability	1.0×
Tier 2 — Agent-Native	Reasoning Integrity, Controllability, Context Integrity	1.5×

Tier 2 pillars carry extra weight because they have no cloud equivalent. Servers don't hallucinate, don't need kill switches in code, and don't accumulate stale reasoning context.

Full spec: github.com/YogirajA/awaf

What This Skill Does

This skill is a natural-language implementation of the AWAF spec. Unlike awaf-cli (the code-scanning reference implementation), this skill accepts any form of evidence and conducts a dialogue-driven assessment:

Source code and configuration files
Cloud provider configs (IAM policies, VPC rules, budget alerts)
Observability exports (Datadog, Grafana, CloudWatch, Honeycomb, LangSmith, Langfuse, Arize)
Eval and testing reports (LangSmith, Braintrust, Promptfoo, hallucination rate data)
Infrastructure as code (Terraform plans, CDK stacks, Helm charts)
Architecture docs (ADRs, design docs, C4 models, system diagrams)
Operational artifacts (runbooks, SLO definitions, incident postmortems)
Security reports (Snyk output, AWS Security Hub, pen test results)
CI/CD configs (GitHub Actions, GitLab CI, Jenkins)
Billing and cost data (AWS Cost Explorer, token usage reports)
Verbal or written description of how your system works

An agent with no code in the repo but verified runbooks, SLO docs, eval reports, and IAM exports can score higher than one with clean code and none of those things. Architecture is what the system does and how it is operated, not just what the code says.

Installation

Via the Claude Code VSCode extension:

Open Manage Plugins, go to the Marketplaces tab, and add:

https://github.com/YogirajA/awaf-skill

Then install the awaf plugin from the marketplace.

Via CLI:

/plugin marketplace add YogirajA/awaf-skill
/plugin install awaf@awaf-marketplace

Usage

/awaf

The skill opens by asking what evidence you can share, then:

Gathers evidence — accepts anything you provide across all evidence categories
Scores each pillar — assigns 0–100 with a confidence level (verified / partial / self_reported)
Produces a structured report — overall score, per-pillar breakdown, findings, recommendations
Requests targeted evidence — after the initial report, identifies the 2–3 gaps that would most improve score confidence and asks for them specifically
Re-scores on new evidence — when you provide more artifacts, affected pillars are re-scored and deltas are shown

Scoring

Per-pillar (0–100): each question carries a risk weight (High = 3 pts, Medium = 2 pts, Low = 1 pt):

pillar_score = (implemented_weight / total_weight) × 100

Answering "none of these apply" to any question caps that pillar at 30 and triggers an automatic High Risk flag.

Overall score applies a 1.5× multiplier to Tier 2 pillars:

overall = sum(score * (1.5 if tier == 2 else 1.0) for each pillar) /
          sum(1.5 if tier == 2 else 1.0 for each pillar)

Readiness rating:

Score	Rating	What It Means
90–100	Production Ready	Architectural patterns are sound across all pillars
75–89	Near Ready	Minor gaps, addressable before production
50–74	Needs Work	Meaningful architectural risks present
25–49	High Risk	Structural problems that will cause incidents
0–24	Not Ready	Do not ship to production

Confidence levels:

Level	Meaning
`verified`	Evidence provided and assessed directly
`partial`	Some evidence provided, meaningful gaps remain
`self_reported`	No evidence provided; score reflects absence only

A verified 60 is more useful than a self_reported 85. The skill always displays confidence and always explains what drove it down.

Example Output

   _      _  _  _    _      ___
  /_\    | || || |  /_\    | __|
 / _ \   | \/ \/ | / _ \   | _|
/_/ \_\   \_/\_/  /_/ \_\  |_       Agent Well-Architected Framework

awaf-marketplace

README

1 Plugin

awaf

awaf-marketplace

README

awaf — Claude Code Skill

What is AWAF?

What This Skill Does

Installation

Usage

Scoring

Example Output

1 Plugin

awaf

Related Marketplaces

nextjs

thedotmack

ruview

awaf — Claude Code Skill

What is AWAF?

What This Skill Does

Installation

Usage

Scoring

Example Output

Related Marketplaces

nextjs

thedotmack

ruview