From grimoire
Designs event-driven architectures for loose coupling, async processing, and real-time data propagation. Covers domain event modeling, streaming platforms, schema design, and the outbox pattern.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-event-driven-architectureThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Architect systems where components communicate by producing and consuming events, enabling loose coupling, independent scaling, and resilience to partial failures.
Architect systems where components communicate by producing and consuming events, enabling loose coupling, independent scaling, and resilience to partial failures.
Adopted by: LinkedIn (Kafka origin), Uber (event sourcing for trips), Netflix (events for recommendations), Amazon (SNS/SQS for order processing) Impact: LinkedIn reduced inter-service coupling from 100% synchronous to < 20% after Kafka adoption; event-driven architectures handle 10-100x more throughput than synchronous RPC for the same infrastructure (Stopford 2018); Martin Fowler: "the most important characteristic of microservices is that they are organized around business capabilities" — events encode business capabilities Why best: Synchronous calls create cascading failures; events decouple producers from consumers, enable fan-out to multiple consumers, and provide a durable audit log of system state changes
Sources: Hohpe & Woolf "Enterprise Integration Patterns" Addison-Wesley (2003); Richardson "Microservices Patterns" Manning (2018); Stopford "Designing Event-Driven Systems" O'Reilly (2018)
Model the domain as events — Identify domain events: "OrderPlaced", "PaymentProcessed", "InventoryReserved". Events are immutable facts in the past tense. They represent what happened, not commands to do something. Name events from the business domain, not technical operations.
Choose an event streaming platform — Apache Kafka for high-throughput, durable, replayable event streams (log-based). AWS SQS/SNS for simpler queue-based messaging with managed infrastructure. RabbitMQ for complex routing patterns and traditional message queuing. Kafka is the standard for event-driven systems requiring replay and event sourcing.
Design event schemas — Define event schema: event type, event ID (UUID), timestamp, source service, version, and payload. Use schema registry (Confluent Schema Registry, AWS Glue Schema Registry) to enforce and evolve schemas. Prefer Avro or Protobuf over JSON for production volume; JSON for developer ergonomics in low-volume cases.
Apply the outbox pattern for reliability — Never publish an event and update a database in separate transactions; one will fail, leaving state inconsistent. Use the transactional outbox: write the event to an outbox table in the same DB transaction as the state change, then a separate process publishes from the outbox to the event stream. This guarantees at-least-once delivery.
Design for idempotent consumers — Events are delivered at least once; consumers may process the same event multiple times. Every consumer must be idempotent: processing the same event twice must produce the same result as processing it once. Use event ID deduplication: store processed event IDs and skip duplicates.
Define consumer groups and partitioning — Kafka: partition the event stream by a natural key (order ID, user ID) to ensure related events are ordered. Assign consumer groups so each service receives all events independently. Partitioning determines parallelism and ordering guarantees.
Handle failures with dead letter queues — Events that cannot be processed (schema violation, downstream failure) must not be silently dropped or block the consumer. Route failed events to a dead letter topic after N retry attempts. Monitor DLQ depth as a service health metric. Implement automated reprocessing after root cause is fixed.
Implement event versioning — Events are immutable once published but schemas evolve. Strategy: add new fields as optional (backward compatible). Never remove or rename existing fields. Use schema version in the event header. Consumers must handle unknown fields gracefully (ignore, don't fail).
Provide event replay capability — Kafka retains events for a configurable period (default 7 days; set longer for event sourcing). New consumers can replay from the beginning of the log to rebuild state. Design consumers to handle replay efficiently. This enables: onboarding new services, disaster recovery, and debugging.
Monitor event pipeline health — Track: consumer lag (events published but not yet processed), DLQ depth, event processing latency (p99), and schema validation error rate. Alert on consumer lag growth (consumer falling behind producer) as a precursor to processing failure. Use Kafka's consumer group offset tracking for lag measurement.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireDesigns event-driven systems with event sourcing and CQRS patterns, including event identification, schema design, transport selection, and versioning.
Design systems that communicate through events instead of direct service calls. Use when building loosely-coupled, scalable, and resilient architectures.
Designs event-driven architectures: maps event flows, defines topic topologies, validates delivery guarantees, and produces event catalog documentation for Kafka, RabbitMQ, SQS, NATS, or Redis Streams.