From substreams
Use when the user wants to run, deploy, or operate a Substreams sink — take a built .spkg and pipe its data into a destination (Postgres, ClickHouse, files, PubSub, webhook). Covers sink CLIs, schema setup, cursor management, reorg handling, and production operations. Distinct from substreams-sql (building the db_out Rust module) and substreams-sink (SDK-level app integration).
How this skill is triggered — by the user, by Claude, or both
Slash command
/substreams:substreams-sink-deployThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
End-to-end guide for taking a working Substreams package and getting its data into a destination — local, hosted, or managed.
End-to-end guide for taking a working Substreams package and getting its data into a destination — local, hosted, or managed.
Use this skill when the user says any of:
Do NOT use this skill when the user is:
substreams-devdb_out / DatabaseChanges → use substreams-sqlsubstreams-sinkWhere does the data need to land?
├── SQL database queries (analytics, joins, BI)
│ └── Postgres / ClickHouse → substreams-sink-sql (proto: DatabaseChanges)
│
├── Object storage / data lake (S3, GCS, local FS)
│ └── CSV / Parquet bulk files → substreams-sink-files (proto: any custom)
│
├── Real-time event bus
│ ├── Google Cloud PubSub → substreams-sink-pubsub (proto: sf.substreams.sink.pubsub.v1.Publish)
│ └── HTTP webhook → substreams sink webhook (proto: any)
│
├── Subgraph (The Graph network)
│ └── graph-node → Substreams-powered Subgraph (proto: sf.substreams.sink.entity.v1.EntityChanges)
│
├── Custom application code
│ └── Go / JS / Rust SDK → see substreams-sink skill (no binary, app-level)
│
└── Just JSONL output (typically files)
└── substreams sink protojson (built-in, zero install — for testing)
Picking SQL vs Files vs Stream SDK — most common decision:
Every sink has two halves. Skill tasks usually mix them up.
┌────────────────────────────────────┐ ┌──────────────────────────────────┐
│ BUILD SIDE (Rust / .spkg) │ │ RUN SIDE (sink binary) │
│ │ │ │
│ - Substreams package (.spkg) │───▶│ - substreams-sink-<x> binary │
│ - Output module emits the │ │ - Reads from Substreams endpoint│
│ proto type the sink expects │ │ - Writes to destination │
│ - One module per sink type │ │ - Manages cursor + reorgs │
└────────────────────────────────────┘ └──────────────────────────────────┘
↑ ↑
│ owned by substreams-dev / │ owned by THIS skill
│ substreams-sql skills │
Key rule: the sink binary will refuse to run if your output module emits the wrong proto type. Each sink expects a specific message.
| Sink | Output module proto type |
|---|---|
substreams-sink-sql | proto:sf.substreams.sink.database.v1.DatabaseChanges |
substreams-sink-pubsub | proto:sf.substreams.sink.pubsub.v1.Publish |
substreams-sink-files | any user-defined proto (you choose; the sink streams it raw) |
substreams sink webhook | any user-defined proto (delivered as JSON to the URL) |
| Subgraph (graph-node) | proto:sf.substreams.sink.entity.v1.EntityChanges |
substreams sink protojson | any (writes proto-as-JSON lines) |
If the user has the wrong output type, redirect them to the substreams-sql skill (for db_out) or the substreams-dev skill (for EntityChanges / custom protos) BEFORE running the sink.
# Binary release (preferred):
brew install streamingfast/tap/substreams-sink-sql # macOS
# or download from: https://github.com/streamingfast/substreams-sink-sql/releases
# From source:
go install github.com/streamingfast/substreams-sink-sql/cmd/substreams-sink-sql@latest
my-substreams/
├── substreams.yaml # has a db_out module (DatabaseChanges output)
├── my-substreams-v0.1.0.spkg
└── schema.sql # your CREATE TABLE statements
Your schema.sql contains your application tables. The setup command creates the sink's internal bookkeeping tables (cursors, substreams_history) automatically — you don't need to define them. Use generate to get a starter template with your domain tables pre-filled:
substreams-sink-sql generate <DSN> ./my-substreams.spkg <module_name>
The sink accepts these schemes only — postgresql:// is rejected by v4.x:
# Postgres
psql://user:pass@host:5432/dbname?sslmode=disable
postgres://user:pass@host:5432/dbname?sslmode=disable # alias
# ClickHouse
clickhouse://default:pass@host:9000/dbname
Note:
parquet://is NOT asubstreams-sink-sqlscheme — Parquet output usessubstreams-sink-fileswith--encoder=parquet. See the Files sink section below.
# 1. Create DB schema (one-time)
substreams-sink-sql setup "$DSN" ./my-substreams.spkg
# 2. Run the sink (long-running process)
# CLI signature: substreams-sink-sql run <DSN> <package.spkg> [<module_name>] [<block_range>] [-e <endpoint>]
# -e/--endpoint is optional if the .spkg embeds a network/endpoint; required otherwise.
# Module name is auto-inferred from the .spkg if a single sink module exists.
substreams-sink-sql run "$DSN" \
./my-substreams.spkg \
"12000000:+1000000" \
-e "https://mainnet.eth.streamingfast.io:443"
# Minimal (endpoint auto-derived from .spkg network config):
# substreams-sink-sql run "$DSN" ./my-substreams.spkg "12000000:+1000000"
Auth required: set SUBSTREAMS_API_KEY (new accounts) or SUBSTREAMS_API_TOKEN (JWT/legacy accounts). See pitfall #5 for details.
CLI arg order matters and is non-obvious. Endpoint is a
-e/--endpointflag, NOT positional. The module name is positional but optional — only required if your.spkghas multiple sink-compatible output modules (rare). Block range syntax:START:STOPorSTART:+Nfor N blocks forward, or omit STOP for open-ended live tailing.
The sink supports two ways to map proto → SQL:
db_out module (recommended). Your Substreams emits DatabaseChanges directly. Full control over INSERT / UPDATE / UPSERT. See substreams-sql skill for proto setup.If you use db_out, the proto type MUST be:
output:
type: proto:sf.substreams.sink.database.v1.DatabaseChanges
The sink writes the latest processed cursor to the cursors table on every batch. On restart, it picks up from there. Do not manage cursors manually. If you delete the row, the sink will re-process from the original start_block.
To intentionally restart from scratch:
substreams-sink-sql undo "$DSN" ./my-substreams.spkg --all
substreams_history to roll back forks.--undo-buffer-size.For tasks that must NEVER see uncommitted data (e.g. accounting, balances), use Postgres OR run with --final-blocks-only.
Hosted SQL sink is not yet released. Check https://docs.substreams.dev/how-to-guides/sinks for availability updates.
You need bulk data on object storage for downstream ETL (Snowflake, BigQuery, dbt, Athena). NOT for real-time application use — files are batched on a configurable interval.
go install github.com/streamingfast/substreams-sink-files/cmd/substreams-sink-files@latest
Unlike SQL, the files sink doesn't mandate a proto shape. You design your output proto, the sink writes one row per repeated entry.
message Transfers {
repeated Transfer transfers = 1;
}
message Transfer {
string from = 1;
string to = 2;
string amount = 3;
// ...
}
The sink walks repeated fields at the top level and emits one CSV/Parquet row per entry.
substreams-sink-files run \
"$ENDPOINT" \
./my-substreams.spkg \
map_transfers \
s3://my-bucket/transfers/ \
"12000000:+1000000" \
--encoder=parquet \
--buffer-max-size=200000 # rows per file
Encoders: parquet (recommended for warehouses), csv, jsonl.
Storage URLs: s3://, gs://, file:///abs/path.
Stored in the destination as a hidden _cursor.json next to the output files. Survives restarts.
go install github.com/streamingfast/substreams-sink-pubsub/cmd/substreams-sink-pubsub@latest
output:
type: proto:sf.substreams.sink.pubsub.v1.Publish
The Publish proto holds the bytes to publish + topic attributes. Your map module wraps your domain proto into Publish.
use std::collections::HashMap;
use prost::Message as ProstMessage;
use substreams_sink_pubsub::pb::sf::substreams::sink::pubsub::v1::{Message, Publish};
#[substreams::handlers::map]
fn map_publish(block: Block) -> Result<Publish, Error> {
let my_event = MyEvent { /* ... */ };
let bytes = my_event.encode_to_vec();
Ok(Publish {
messages: vec![Message { data: bytes, attributes: HashMap::new() }],
})
}
substreams-sink-pubsub sink \
-e "$ENDPOINT" \
--project my-gcp-project \
./my-substreams.spkg \
map_publish \
my-topic-name
GCP credentials via standard GOOGLE_APPLICATION_CREDENTIALS env var.
No separate binary needed. Use the built-in:
substreams sink webhook \
-e "$ENDPOINT" \
./my-substreams.spkg \
map_events \
https://my-app.example.com/webhook
Each block's output is POSTed as JSON. Your endpoint must return 2xx; otherwise the sink retries with exponential backoff. Cursor is written to ./state.json in the current working directory — persist or mount that path in Docker to resume across restarts.
Don't use webhooks for high-volume data. PubSub or SQL beats it past ~100 events/sec.
Built-in, zero install. Just emits proto-as-JSONL files. Use this for local validation before plugging in a real sink:
substreams sink protojson \
-e "$ENDPOINT" \
./my-substreams.spkg \
map_my_module \
./output_dir/ \
"12000000:+100"
Substreams-powered subgraphs deploy via graph deploy, not via a substreams sink binary. Required output module:
- name: graph_out
kind: map
inputs:
- source: sf.substreams.type.v2.Block
output:
type: proto:sf.substreams.sink.entity.v1.EntityChanges
Then in subgraph.yaml:
specVersion: 1.2.0
features:
- nonFatalErrors
dataSources:
- kind: substreams
name: my-substreams
network: mainnet
source:
package:
moduleName: graph_out
file: ./my-substreams-v0.1.0.spkg
mapping:
kind: substreams/graph-entities
apiVersion: 0.0.7
schema:
file: ./schema.graphql
Deploy: graph deploy <subgraph-name> (Studio or hosted).
For EntityChanges proto setup and module patterns → use the substreams-dev skill.
| You want | Pick |
|---|---|
| Zero ops, managed Postgres (not yet released) | StreamingFast hosted ([email protected]) / Pinax |
| Free, you manage the box | Self-host substreams-sink-sql |
| Subgraph on The Graph network | Substreams-powered Subgraph (decentralized) |
| Files into your data lake | Self-host substreams-sink-files |
| Push events to your existing app | Stream SDK (substreams-sink skill) |
Hosted sink is not yet released. Track https://docs.substreams.dev/how-to-guides/sinks for availability updates.
Error: module 'db_out' output type 'proto:my.types.v1.Events'
does not match expected 'proto:sf.substreams.sink.database.v1.DatabaseChanges'
Fix: the substreams module's output proto must EXACTLY match what the sink expects. See table above. For SQL specifically, you must use db_out returning DatabaseChanges — not your domain proto. The substreams-sql skill covers building this module.
schema.sql without running generate → missing domain tablessetup auto-creates the sink's internal tables (cursors, substreams_history), but it won't know your application tables. If you skipped generate, your app tables won't exist. Run:
substreams-sink-sql generate "$DSN" ./my-substreams.spkg <module>
then add your custom table definitions and re-run setup.
Cursor not persisted. For SQL sink: cursor lives in DB — not affected. For files sink: cursor lives in destination bucket — not affected if you re-point at the same URL. For webhook/protojson: ./state.json must persist between runs (mount as volume in Docker).
block.transactions() works but ClickHouse sees no rows for ~50 blocksReorg buffer. Set --undo-buffer-size=12 or smaller if you accept some rollback risk. Default holds writes until finality (~32 blocks on Eth, ~12 on Polygon, etc.).
substreams-sink-sql says "auth required" even with API key setThe sink reads SUBSTREAMS_API_KEY from env (this is the correct var name — NOT SUBSTREAMS_API_TOKEN). If you're using --api-key-envvar=OTHER_VAR make sure the env var name matches. JWT mode (older accounts) requires SUBSTREAMS_API_TOKEN instead.
--batch-block-flush-interval too large for short rangesDefault is 1000. If you process fewer than 1000 blocks (e.g. an eval task with +100), the sink may keep the batch in memory until either the interval is reached or the process exits cleanly and performs a final partial flush. Net result: you may see no rows during the run, and short test ranges can be misleading unless you lower the flush interval.
# Wrong (default 1000) — 100 blocks won't commit
substreams-sink-sql run "$DSN" ./pkg.spkg "18000000:+100" -e "$EP"
# Right — flush every block for short ranges
substreams-sink-sql run "$DSN" ./pkg.spkg "18000000:+100" -e "$EP" \
--batch-block-flush-interval=1
Production: leave default 1000 (or set higher for very high-throughput chains like Solana). Tune down only for testing or very low-volume modules.
Sink v4+ enforces strict matching between schema.sql and the wire-format PK from your db_out Rust module:
Your schema.sql declares | Your Rust code MUST send |
|---|---|
PRIMARY KEY (id) (single column) | tables.create_row("t", &id) — single string |
PRIMARY KEY (tx_hash, log_index) (composite, 2+ columns) | tables.create_row("t", [("tx_hash", &h), ("log_index", li)]) — slice of (col, val) tuples |
If you see this error: either rewrite the Rust to send a tuple slice, OR change the schema to a single-column synthetic PK (id VARCHAR PRIMARY KEY) and concatenate fields in Rust (format!("{}-{}", tx_hash, log_index)).
For build-side patterns → substreams-sql skill.
You changed your .spkg and now the sink errors:
module hash mismatch: cursor was for hash X, current module is hash Y
The cursor pins a specific module-hash to detect drift. Two options to override:
# Log a warning but continue from the existing cursor (data may be inconsistent)
substreams-sink-sql run ... --on-module-hash-mismatch=warn
# Same as warn but silently — still continues from existing cursor, no reset
substreams-sink-sql run ... --on-module-hash-mismatch=ignore
Neither flag resets the cursor. Both continue from where the old module left off, mixing old and new data — a debugging nightmare if your schema or logic changed. For a clean restart, wipe the destination manually and re-run from the original start block.
GCP service account needs pubsub.publisher on the topic. Check gcloud auth application-default login is set, OR GOOGLE_APPLICATION_CREDENTIALS points at a key file with the right role.
For initial load of millions of blocks then continuous tailing:
# 1. Backfill (bounded range, parallel-friendly with --workers=N)
substreams-sink-sql run "$DSN" ./pkg.spkg "12000000:18000000" -e "$EP" --workers=8
# 2. Live (start = stop block of backfill, omit stop for open-ended)
substreams-sink-sql run "$DSN" ./pkg.spkg "18000000:" -e "$EP" --workers=1
For files sink: same positional range pattern — "12000000:18000000" for backfill, "18000000:" for live.
All sinks expose Prometheus metrics on --metrics-listen-addr=:9100:
substreams_sink_block_height — last processed blocksubstreams_sink_blocks_per_second — throughputsubstreams_sink_undo_count — reorgs handledAlert on:
If your sink writes balances/totals, you MUST use Postgres + db_out with UPSERT. ClickHouse + relational-mappings can produce double-counted rows during reorgs. When in doubt, run with --final-blocks-only (sacrifices ~1 minute of latency for zero rollback risk).
Kill -9 the sink mid-batch. Restart. Sink reads cursor → resumes at last committed block → no duplicate rows (atomic per-batch commit). This is by design; do not add your own dedup logic.
# Prereqs
export SUBSTREAMS_API_KEY=server_xxx
export DSN="psql://user:pass@localhost:5432/mydb?sslmode=disable" # sink DSN (psql:// or postgres://)
export PSQL_DSN="postgresql://user:pass@localhost:5432/mydb?sslmode=disable" # for psql client
# 0. Get a working substreams package with a db_out module
# (see substreams-sql skill for the build side)
# 1. Generate schema.sql template
substreams-sink-sql generate "$DSN" ./erc20.spkg db_out > schema.sql
# Edit schema.sql to add your domain tables (setup auto-creates the internal cursors/substreams_history tables)
# 2. Create DB tables
substreams-sink-sql setup "$DSN" ./erc20.spkg
# 3. Run sink (foreground; use systemd/docker for production)
substreams-sink-sql run "$DSN" \
./erc20.spkg \
"12000000:+10000" \
-e "https://mainnet.eth.streamingfast.io:443" \
--metrics-listen-addr=:9100
# 4. Query
psql "$PSQL_DSN" -c "SELECT count(*) FROM erc20_transfers"
npx claudepluginhub streamingfast/substreams-skills --plugin substreams-devProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.