From grimoire
Designs, evaluates, or documents high-level system architecture — requirements, components, data flow, and trade-offs — before writing code.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-system-architectureThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Derive requirements, identify components, define data flow, and articulate trade-offs — before writing any code.
Derive requirements, identify components, define data flow, and articulate trade-offs — before writing any code.
Adopted by: Amazon (design documents required before any significant engineering work per Amazon's engineering tenets), Google (design docs are mandatory for projects above a size threshold — described in "Software Engineering at Google", O'Reilly 2020), Stripe (RFC process for architectural decisions).
Impact: IBM Systems Sciences Institute found that fixing a defect found in design costs 6× less than one found during implementation, and 15× less than one found post-release. Google's internal data (Winters et al., "Software Engineering at Google") shows that teams with upfront design docs ship features with 50% fewer post-launch incidents. The cost of a design document is 1–2 days; the cost of a wrong architecture is months of rework.
Why best: Writing code without a design is the equivalent of building a house without blueprints. Informal design (whiteboard-only) produces undocumented trade-offs that trap future engineers. Formal upfront design (big design up front, BDUF) goes too far and creates documents no one reads. The sweet spot is a lightweight, decision-focused design doc that documents why, not how.
Sources: Kleppmann "DDIA" (O'Reilly 2017), Winters et al. "Software Engineering at Google" (O'Reilly 2020), Google SRE Book (Beyer et al., O'Reilly 2016), IBM Systems Sciences Institute defect cost study
List what the system must do. Write these as user-facing capabilities, not technical choices:
Distinguish must-have from nice-to-have. Scope creep in requirements is the single most common source of over-engineered systems.
Quantify quality attributes. Vague targets ("fast", "reliable") are undesignable. Require numbers:
| Attribute | Metric | Target |
|---|---|---|
| Latency | p99 response time | < 200 ms |
| Availability | Uptime per month | 99.9% (43 min downtime) |
| Throughput | Peak requests/sec | 10,000 RPS |
| Durability | Data loss tolerance | Zero (RPO = 0) |
| Storage | Data volume in 3 years | ~10 TB |
Work backwards from these numbers to size components.
Before drawing components, sanity-check the numbers:
Map the functional requirements to components. Common building blocks:
Only include components that the requirements justify. Every component adds operational complexity.
For each core user journey (typically 3–5), draw the request path end-to-end:
For each path, identify: latency budget, failure modes, and consistency requirements (is stale data acceptable?).
The database choice is the highest-leverage and hardest-to-change architectural decision. Apply these heuristics:
| Need | Consider |
|---|---|
| Relational data with transactions | PostgreSQL, MySQL |
| High-write time-series data | ClickHouse, TimescaleDB, InfluxDB |
| Flexible schema, document-oriented | MongoDB, DynamoDB |
| Global distribution, multi-region writes | CockroachDB, Spanner, DynamoDB Global Tables |
| Graph traversals | Neo4j, Amazon Neptune |
| Full-text search | Elasticsearch, OpenSearch |
Avoid polyglot persistence unless the requirements force it — each additional database is a synchronization problem.
Every non-trivial system has one or two hard problems. Name them and state your approach:
For each significant architectural choice, record: the option chosen, the alternatives considered, and the reason for the choice. This is the most valuable part of the design document — it prevents future engineers from re-litigating settled decisions.
Format:
Decision: Use PostgreSQL as the primary store. Alternatives: MongoDB (rejected — relational joins needed), DynamoDB (rejected — complex queries not supported). Reason: Data is relational; referential integrity is required; team has PostgreSQL expertise.
Back-of-envelope for a photo sharing app:
Trade-off example:
We chose eventual consistency for the "likes" counter. Exact counts are not user-critical; a 1–5 second lag is acceptable. Strong consistency would require distributed coordination that adds 50–100 ms latency per write — not worth it for a display-only metric.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireStructures a complete system design answer for interview questions or real architecture sessions. Covers requirements, capacity estimates, high-level design, component deep-dives, trade-offs, and follow-up considerations.
Guides architectural thinking through requirements, high-level design, deep dives, and trade-off analysis. Use for system design, ADRs, and API planning before writing docs.
Guides system design with requirements analysis, C4/sequence/data flow diagrams, trade-offs, capacity estimation, distributed systems theory, load balancing, caching, messaging architectures, and patterns for URL shorteners, chat, news feeds, rate limiters.