From grimoire
Distributes data ownership to domain teams treating data as a product, with a self-serve platform and federated governance. Useful when centralized data platforms become bottlenecks.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-data-mesh-architectureThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Distribute data ownership to domain teams treating data as a product, connected by a self-serve platform and governed by federated standards.
Distribute data ownership to domain teams treating data as a product, connected by a self-serve platform and governed by federated standards.
Adopted by: Zalando (data mesh pioneer), JPMorgan Chase, Saxo Bank, Netflix (elements of data mesh), HelloFresh — enterprises with 10+ business domains adopting this pattern Impact: Zalando: reduced time-to-insight from weeks to days after domain team ownership; centralized data teams at scale are a bottleneck — average ticket wait time of 3-6 weeks for data access in large enterprises (Dehghani 2019) Why best: Centralized data lakes fail at scale because domain expertise cannot be centralized; Conway's Law means data architecture must mirror organizational structure
Sources: Dehghani "Data Mesh" O'Reilly (2022); Dehghani martinfowler.com (2019); Thoughtworks Technology Radar (data mesh as Adopt since 2021)
Identify domain boundaries — Map your organization's business domains (Sales, Inventory, Logistics, Finance). Each domain that generates data and consumes data from others is a candidate data owner. Boundaries should align with Conway's Law: how your teams are organized determines where domain boundaries are.
Define data products for each domain — A data product is the unit of data ownership. Each domain exposes its data as a product: analytical datasets (Parquet/Delta Lake files), APIs (REST/GraphQL), events (Kafka topics), or semantic models (dbt models). A data product has: a clear owner, an SLA, a schema, and a quality contract.
Apply data product thinking — Data products must satisfy: discoverability (searchable in a catalog), addressability (stable URI/identifier), trustworthiness (SLAs on freshness and quality), self-describing (schema, lineage, documentation), interoperable (standard formats), and secure (access-controlled). These are Dehghani's six characteristics of data products.
Build a self-serve data platform — Domain teams cannot own data products if they need a central team to provision storage, compute, and pipelines. The platform team provides: infrastructure-as-code templates for common patterns (dbt + Snowflake, Spark + Delta Lake), CI/CD templates for data pipelines, monitoring templates, and a data catalog. Platform is a product for internal domain teams.
Establish a data catalog — All data products must be discoverable. Implement a data catalog (Apache Atlas, DataHub, Amundsen, AWS Glue Data Catalog) where every data product is registered with: owner, schema, lineage, SLA, access policy, and sample data. Without discoverability, data products are invisible and unused.
Design federated data governance — Governance is not centralized; it is federated. The central data governance team defines: global standards (naming conventions, data classification, privacy requirements), interoperability contracts (common schemas for shared concepts like Customer), and quality thresholds (minimum SLA requirements for published data products). Domain teams implement governance standards in their products.
Implement data contracts — A data contract is a formal agreement between a data producer and consumer: schema, SLA (freshness, availability, quality metrics), change management process (deprecation notice period), and owner contact. Tools: DataContract CLI, Soda Core, or dbt contracts. Data contracts enforce the same discipline as API contracts for analytical data.
Design for data lineage — Every data product must expose its upstream dependencies and downstream consumers. Implement lineage tracking at the platform level (OpenLineage, Marquez). Lineage enables impact analysis: "Which data products are affected if the Orders table schema changes?" Without lineage, cross-product dependencies are invisible.
Define access management per data product — Each data product has its own access policy: public (any internal user), restricted (authenticated domain teams), classified (explicit approval required). Implement as attribute-based access control at the platform layer. Data governance team audits access policies quarterly.
Measure data product health — Each data product owner tracks: freshness SLA compliance (% of time data is updated within the SLA), schema stability (breaking changes per quarter), access request response time, and consumer satisfaction (quarterly survey). Publish metrics in the data catalog. Unhealthy data products lose consumer trust.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireGuides designing data platforms by comparing warehouses, data lakes, lakehouses, and implementing data mesh patterns with trade-offs.
Outlines data architecture patterns (warehouse, lake, lakehouse, mesh) with decision trees, schema designs (star/snowflake), medallion zones, Kimball/Inmon, ETL/ELT pipelines, streaming, and scaling strategies.
Designs and implements scalable batch and streaming data pipelines, modern data warehouses, and lakehouse architectures using Spark, dbt, Airflow, and cloud-native platforms.