By selhorys
Manage a DataSpoke deployment via its public API: configure and verify API access, inspect ingestion sources and trigger extractor runs, manage validation slots and author validation routines for data pipelines, and explore OpenAPI governance and ontology endpoints.
Connect this plugin to a deployed DataSpoke and verify access. Use to point Claude at a DataSpoke deployment (API base URL + dsk_ API token), mint a token from email/password, or check who you are and what role you hold. Prerequisite for every other dataspoke-* skill — run this first, or when calls start returning 401.
Answer questions about DataSpoke Governance metrics (UC5) on a deployed instance and make read calls against its public API. Stub — knows the route prefix and points at the deployment's live OpenAPI reference; not yet a full metric-authoring workflow. Use for "what does governance expose" or basic reads.
Manage DataSpoke ingestion sources (UC1) on a deployed instance — list and inspect sources, create or edit ACTIVE_CUSTOM_MANAGED and PASSIVE sources, trigger dry-run and real extractor runs, and review run history, emitted datasets, and the unmanaged bucket. Use for any "register/check ingestion" or "is this dataset ingested" question. Answers questions and, on request, writes and fires the API calls.
Answer questions about DataSpoke Metadata Generation (UC4) on a deployed instance and make read calls against its public API. Stub — knows the route prefix and points at the deployment's live OpenAPI reference; not yet a full authoring/review workflow. Use for "what does metagen expose" or basic reads.
Answer questions about DataSpoke Ontology Generation (UC3) on a deployed instance and make read calls against its public API. Stub — knows the route prefix and points at the deployment's live OpenAPI reference; not yet a full authoring workflow. Use for "what does ontogen expose" or basic reads.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Note: This project is currently under active development and has not been officially released. APIs, features, and documentation are subject to change without notice.
AI-powered sidecar extension for DataHub, built API-first.
DataSpoke is a loosely coupled sidecar to DataHub. DataHub stores metadata (the Hub); DataSpoke extends it with five baseline features (the Spokes): Ingestion Control, Validation, Ontology Generation, Metadata Generation, and Governance. Both UI and API are organised by feature — one function namespace each under /spoke/.
This repository delivers two artifacts:
spec/API.md is the canonical surface; the frontend is a thin reference UI that consumes those routes verbatim.Fork or copy this repository to create a data catalog for your organization.
DataSpoke ships as an umbrella Helm chart at helm-charts/dataspoke/. The production profile (values.yaml) enables the application components (frontend, API) and infrastructure (PostgreSQL with pgvector + Apache AGE, Redis, Airflow). The optional event-consumer subchart is shipped disabled — baseline UC1–UC5 are schedule-driven via Airflow rather than event-driven.
docker build -t <registry>/dataspoke/api:latest -f docker-images/api/Dockerfile . (Frontend image TBD; event-consumer is disabled by default)helm-charts/dataspoke/values.yaml and customize — container images, ingress hosts/TLS, DataHub connection (config.datahub.gmsUrl), and secrets (PostgreSQL, Redis, JWT, LLM API key). For production secrets management, consider External Secrets Operator.helm dependency build ./helm-charts/dataspoke
helm upgrade --install dataspoke ./helm-charts/dataspoke \
--namespace dataspoke --create-namespace \
--values ./your-values.yaml
Resource sizing: Production defaults total ~5 CPU / ~10 CPU and ~9.5 Gi / ~22 Gi (requests / limits), excluding the opt-in event-consumer. See spec/feature/HELM_CHART.md for the full chart reference.
uvThe dev profile installs infrastructure (DataHub, PostgreSQL with pgvector + Apache AGE, Redis, Airflow, self-hosted Langfuse for LLM observability, example data sources) into a Kubernetes cluster via the umbrella Helm chart plus dev peripherals. The API runs in-cluster alongside Airflow (for workflow callbacks); frontend runs on the host.
cp helm-charts/.env.example helm-charts/.env # Set your Kubernetes context
./helm-charts/bin/install.sh --profile dev # ~5-10 min first run
Using Claude Code? Run
/k8s-deploy installfor guided setup.
After install, verify all services are reachable:
./helm-charts/bin/health-check.sh # Verify all services respond via nginx-ingress
Services are accessed via nginx-ingress endpoints — HTTP services use virtual-host routing (http://<service>.<INGRESS_IP>.nip.io/) and TCP services use dedicated ports on the ingress IP. See helm-charts/README.md for the full endpoint table, credentials, lock service, namespace architecture, resource budgets, and troubleshooting.
./helm-charts/bin/uninstall.sh --profile dev
uv sync # Install dependencies
./helm-charts/bin/install.sh --profile dev --components api # Rebuild + redeploy the API
kubectl scale deployment/dataspoke-api --replicas=0 \
-n "${DATASPOKE_KUBE_DATASPOKE_NAMESPACE}" # Scale down in-cluster API
The API is accessible via nginx-ingress at http://api.<INGRESS_IP>.nip.io/api/v1/. See spec/TESTING.md for testing modes.
npx claudepluginhub selhorys/dataspoke-baseline --plugin dataspokeDataHub development and interaction toolkit with connector planning, PR review, catalog search, metadata enrichment, lineage tracing, data quality management, and connection setup skills
The most comprehensive SAP Datasphere plugin for Claude. 18 specialized skills covering exploration, data modeling, integration, BW Bridge migration, security architecture, CLI automation, business content activation, catalog governance, performance optimization, and troubleshooting — all through natural language. Powered by 45 MCP tools with enterprise-grade security.
Quick insights from dlt pipeline data. Connect to a pipeline, profile tables, plan charts, and assemble marimo dashboards.
Skills for working with Bauplan data lakehouses. Includes data exploration, pipeline creation, safe S3 ingestion, pipeline debugging, data assessment, and quality check generation.
Spec-Driven Development framework for Data Engineering — 58 agents, 24 KB domains, 5-phase SDD workflow, 31 commands
Data engineering plugin - warehouse exploration, pipeline authoring, Airflow integration