Search everything...

Stats

Actions

Available In

dataspoke

Name: dataspoke
Author: selhorys

By selhorys

Manage a DataSpoke deployment via its public API: configure and verify API access, inspect ingestion sources and trigger extractor runs, manage validation slots and author validation routines for data pipelines, and explore OpenAPI governance and ontology endpoints.

npx claudepluginhub selhorys/dataspoke-baseline --plugin dataspoke

Popularity

Stars

Top 25%

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Skills6

dataspoke-access

/dataspoke-access

Connect this plugin to a deployed DataSpoke and verify access. Use to point Claude at a DataSpoke deployment (API base URL + dsk_ API token), mint a token from email/password, or check who you are and what role you hold. Prerequisite for every other dataspoke-* skill — run this first, or when calls start returning 401.

dataspoke-governance

/dataspoke-governance

Answer questions about DataSpoke Governance metrics (UC5) on a deployed instance and make read calls against its public API. Stub — knows the route prefix and points at the deployment's live OpenAPI reference; not yet a full metric-authoring workflow. Use for "what does governance expose" or basic reads.

dataspoke-ingestion

/dataspoke-ingestion

Manage DataSpoke ingestion sources (UC1) on a deployed instance — list and inspect sources, create or edit ACTIVE_CUSTOM_MANAGED and PASSIVE sources, trigger dry-run and real extractor runs, and review run history, emitted datasets, and the unmanaged bucket. Use for any "register/check ingestion" or "is this dataset ingested" question. Answers questions and, on request, writes and fires the API calls.

dataspoke-metagen

/dataspoke-metagen

Answer questions about DataSpoke Metadata Generation (UC4) on a deployed instance and make read calls against its public API. Stub — knows the route prefix and points at the deployment's live OpenAPI reference; not yet a full authoring/review workflow. Use for "what does metagen expose" or basic reads.

dataspoke-ontogen

/dataspoke-ontogen

Answer questions about DataSpoke Ontology Generation (UC3) on a deployed instance and make read calls against its public API. Stub — knows the route prefix and points at the deployment's live OpenAPI reference; not yet a full authoring workflow. Use for "what does ontogen expose" or basic reads.

Stats

Version0.1.0

LanguagePython

Stars15

MaintenanceGood

LicenseApache-2.0

Last CommitJun 16, 2026

AddedJun 17, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

dataspoke15

README

DataSpoke

Note: This project is currently under active development and has not been officially released. APIs, features, and documentation are subject to change without notice.

AI-powered sidecar extension for DataHub, built API-first.

DataSpoke is a loosely coupled sidecar to DataHub. DataHub stores metadata (the Hub); DataSpoke extends it with five baseline features (the Spokes): Ingestion Control, Validation, Ontology Generation, Metadata Generation, and Governance. Both UI and API are organised by feature — one function namespace each under /spoke/.

This repository delivers two artifacts:

Baseline Product — A foundational data catalog implementation of the five MANIFESTO features. The API contract in spec/API.md is the canonical surface; the frontend is a thin reference UI that consumes those routes verbatim.
Productized Scaffold — An AI Scaffold (Claude Code conventions, generator/evaluator subagents, PRauto) plus a Development Scaffold (scripted Kubernetes dev environment) that together let teams fork this repo and build custom Spokes with AI coding agents.

Fork or copy this repository to create a data catalog for your organization.

Usage Guide

Prerequisites

kubectl + Helm v3 installed and configured
A Kubernetes cluster with appropriate capacity
A separate DataHub instance — DataSpoke connects to DataHub as an external dependency

Deploy to Production

DataSpoke ships as an umbrella Helm chart at helm-charts/dataspoke/. The production profile (values.yaml) enables the application components (frontend, API) and infrastructure (PostgreSQL with pgvector + Apache AGE, Redis, Airflow). The optional event-consumer subchart is shipped disabled — baseline UC1–UC5 are schedule-driven via Airflow rather than event-driven.

Build and push images: docker build -t <registry>/dataspoke/api:latest -f docker-images/api/Dockerfile . (Frontend image TBD; event-consumer is disabled by default)
Configure: Copy helm-charts/dataspoke/values.yaml and customize — container images, ingress hosts/TLS, DataHub connection (config.datahub.gmsUrl), and secrets (PostgreSQL, Redis, JWT, LLM API key). For production secrets management, consider External Secrets Operator.

Install:

helm dependency build ./helm-charts/dataspoke
helm upgrade --install dataspoke ./helm-charts/dataspoke \
  --namespace dataspoke --create-namespace \
  --values ./your-values.yaml

Resource sizing: Production defaults total ~5 CPU / ~10 CPU and ~9.5 Gi / ~22 Gi (requests / limits), excluding the opt-in event-consumer. See spec/feature/HELM_CHART.md for the full chart reference.

Development Guide

Prerequisites

kubectl + Helm v3 installed and configured
A Kubernetes cluster (GKE Autopilot recommended; Docker Desktop, minikube, or kind also work) with 8+ CPUs / 24 GB RAM / 150 GB storage
Python 3.13 and uv
Node.js 18+ (TBD — frontend not yet implemented)

Dev Environment Setup

The dev profile installs infrastructure (DataHub, PostgreSQL with pgvector + Apache AGE, Redis, Airflow, self-hosted Langfuse for LLM observability, example data sources) into a Kubernetes cluster via the umbrella Helm chart plus dev peripherals. The API runs in-cluster alongside Airflow (for workflow callbacks); frontend runs on the host.

cp helm-charts/.env.example helm-charts/.env       # Set your Kubernetes context
./helm-charts/bin/install.sh --profile dev          # ~5-10 min first run

Using Claude Code? Run /k8s-deploy install for guided setup.

After install, verify all services are reachable:

./helm-charts/bin/health-check.sh                   # Verify all services respond via nginx-ingress

Services are accessed via nginx-ingress endpoints — HTTP services use virtual-host routing (http://<service>.<INGRESS_IP>.nip.io/) and TCP services use dedicated ports on the ingress IP. See helm-charts/README.md for the full endpoint table, credentials, lock service, namespace architecture, resource budgets, and troubleshooting.

Uninstall

./helm-charts/bin/uninstall.sh --profile dev

Running DataSpoke

uv sync                                                                # Install dependencies
./helm-charts/bin/install.sh --profile dev --components api            # Rebuild + redeploy the API
kubectl scale deployment/dataspoke-api --replicas=0 \
  -n "${DATASPOKE_KUBE_DATASPOKE_NAMESPACE}"                           # Scale down in-cluster API

The API is accessible via nginx-ingress at http://api.<INGRESS_IP>.nip.io/api/v1/. See spec/TESTING.md for testing modes.

Implementation Status

View full README on GitHub

dataspoke

Popularity

What's Inside

Confidence

README

DataSpoke

Usage Guide

Prerequisites

Deploy to Production

Development Guide

Prerequisites

Dev Environment Setup

Uninstall

Running DataSpoke

Implementation Status

Similar Plugins

datahub-skills

datasphere

data-exploration

bauplan

agentspec

astronomer-data

DataSpoke

Usage Guide

Prerequisites

Deploy to Production

Development Guide

Prerequisites

Dev Environment Setup

Uninstall

Running DataSpoke

Implementation Status

Popularity

Health & Quality

Similar Plugins

datahub-skills

datasphere

data-exploration

bauplan

agentspec

astronomer-data