From substreams
Expert knowledge for developing, building, and debugging Substreams projects on any blockchain. Use when working with substreams.yaml manifests, Rust modules, protobuf schemas, or blockchain data processing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/substreams:substreams-devThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation.
Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation.
Substreams is a powerful blockchain indexing technology that enables:
substreams.yaml): Defines modules, networks, dependenciesmy-substreams/
├── substreams.yaml # Manifest (manual)
├── README.md # Package documentation for substreams.dev registry (manual)
├── schema.sql # SQL schema for sinks (manual)
├── Cargo.toml # Rust dependencies (manual)
├── build.rs # ABI code generation (manual, optional)
├── abi/ # Contract ABI JSON files (manual)
│ └── my_contract.json # ABI for code generation
├── proto/
│ └── events.proto # Schema definitions (manual)
├── src/
│ ├── lib.rs # Rust module code (manual)
│ ├── abi/ # Generated ABI bindings (from build.rs)
│ │ ├── mod.rs # Exports generated modules (manual)
│ │ └── my_contract.rs # Generated by build.rs (auto)
│ └── pb/ # Generated protobuf code (auto - DO NOT CREATE)
└── target/ # Build output (gitignored)
Important: The src/pb/ directory is entirely auto-generated by substreams build. Never create it manually.
abi/ - Contains JSON ABI files for smart contracts. These are used by build.rs to generate Rust bindings.
src/abi/ - Generated Rust code from ABIs. Create src/abi/mod.rs to export the generated modules:
// src/abi/mod.rs
pub mod my_contract; // Matches the name in build.rs
src/pb/ - Generated protobuf code. This directory and its contents are auto-generated - do NOT create manually. Generate with:
substreams protogen # Generate proto bindings only (fast, useful for iterative development)
substreams build # Full build (includes protogen + cargo build)
Use substreams protogen for iterative development - it generates the Rust bindings quickly so you get type hints and autocomplete while writing module code, without waiting for the full WASM compilation.
// build.rs
fn main() {
substreams_ethereum::Abigen::new("MyContract", "abi/my_contract.json")
.expect("Failed to load ABI")
.generate()
.expect("Failed to generate bindings")
.write_to_file("src/abi/my_contract.rs")
.expect("Failed to write bindings");
}
substreams build for protobuf code generationRunning substreams run against hosted endpoints requires authentication. Get your API key from The Graph Market - sign up at thegraph.market/auth/signup.
CLI Authentication (Recommended):
substreams auth # Interactive authentication, stores token locally
Quick Token Generation: Visit thegraph.market/auth/substreams-devenv to generate a JWT token from your API key directly in the browser.
Environment Variables (Alternative):
export SUBSTREAMS_API_KEY="your-api-key"
# Or set bearer token directly
export SUBSTREAMS_API_TOKEN="your-jwt-token"
The substreams auth command handles token exchange and local storage automatically, making it the easiest way to get started.
Before writing any code, check whether the request provides all of the following. If one or more items are missing or ambiguous, ask the user ONCE with a consolidated list — do not make silent assumptions and do not write code until you have the answers.
| Required input | Why it matters |
|---|---|
| Target chain | Block type, RPC endpoints, and ABI tooling differ per chain |
| Contract address(es) or protocol | Determines which events/calls to decode |
| Data you want to capture | Events only? Calls? State changes? Aggregations? |
| Output / sink type | substreams run, SQL sink, graph-out, custom sink? |
| Block range or time window | initialBlock and test range; performance implications |
| Thresholds or filters | Min value, token allowlist, address filter, etc. |
| Block sparsity | Does the target appear in only some blocks? If so, a block index filter is a near-mandatory cost optimization — see "Block & Transaction Filtering" |
If any item is unknown, respond with something like:
Before I build this, I need a few details:
- Which chain? (Ethereum mainnet, Polygon, Arbitrum, ...)
- Which contract(s) or protocol?
- What specific events or data fields do you need?
- Where should the output go? (Postgres, The Graph, just
substreams run?)- What start block or date range?
- Any filters — minimum transfer size, specific token list, etc.?
Only ask once. If you receive partial answers, proceed with what you have and state your remaining assumptions explicitly in your response.
If the prompt is concrete and complete, skip the checklist and build immediately.
Migrating an existing project? Load the
substreams-convertskill if you are porting a subgraph or Solana program/contract to Substreams instead of starting from scratch.
substreams init or create manifest manually.proto files for your data structuressrc/lib.rssubstreams build to compile to .spkgsubstreams run with small block range (recommended: 1000 blocks)README.md for the substreams.dev registry (see "README for substreams.dev Registry" below)Every Substreams package published to the registry must include a README.md. This file is the primary documentation shown on substreams.dev and is the first thing consumers see.
Required sections:
# <Package Title>
<One-sentence description of what this package indexes and outputs.>
## Overview
<2-3 sentences: what data it captures, what protocol/chain, and intended use case.>
## Modules
| Module | Kind | Output Type | Description |
|--------|------|-------------|-------------|
| `map_events` | map | `proto:my.types.v1.Events` | Extracts transfer events from each block |
| `store_totals` | store | `int64` | Accumulates running totals per token |
## Prerequisites
- [`substreams` CLI](https://substreams.streamingfast.io/documentation/consume/installing-the-cli) installed
- Authenticated: `substreams auth`
## Quick Start
```bash
substreams run -e mainnet.eth.streamingfast.io \
substreams.yaml map_events \
-s 18000000 -t +1000
**Rules:**
- Title matches `package.name` in `substreams.yaml`
- Module table lists every `name:` entry from the manifest — consumers need this to know what to `substreams run`
- Quick Start uses a real block range, not a placeholder
- Do NOT include a "Contributing" or "License" section — the registry pulls license from the manifest
### Module Types
**Map Module** - Transforms input to output
```yaml
- name: map_events
kind: map
inputs:
- source: sf.ethereum.type.v2.Block
output:
type: proto:my.types.Events
Store Module - Aggregates data across blocks
- name: store_totals
kind: store
updatePolicy: add
valueType: int64
inputs:
- map: map_events
Index Module (kind: blockIndex) - Emits per-block Keys so downstream
modules can skip blocks. The handler is #[substreams::handlers::map]
returning Keys.
- name: index_transfers
kind: blockIndex
inputs:
- map: map_events
output:
type: proto:sf.substreams.index.v1.Keys
# A consuming module skips non-matching blocks via `blockFilter`:
- name: filtered_transfers
kind: map
blockFilter:
module: index_transfers # references the blockIndex module above
query:
string: "token:0xdac17f958d2ee523a2206206994597c13d831ec7"
inputs:
- map: map_events
output:
type: proto:my.types.Transfers
The index alone does nothing — only a module with an explicit
blockFiltergets blocks skipped. See "Block & Transaction Filtering (Cost-Critical)" below.
initialBlockguidance:modules: - name: map_events kind: map initialBlock: 18000000 # ✅ start of your test/data range # NOT: 12369621 # ❌ protocol genesis — forces full backfill on every run inputs: - source: sf.ethereum.type.v2.Block output: type: proto:my.types.EventsPin
initialBlockto the first block your downstream consumer actually needs. The runtime starts processing frommax(--start-block, initialBlock), then walks forward. Stores must catch up frominitialBlockon each cold start — a deep genesis pin turns a 100-block test into a multi-hour backfill.For reference: the T3.2 golden uses
initialBlock: 17999900to cover a-s 18000000 -t +100acceptance window.
Whenever a Substreams targets only a subset of blocks — a specific contract, program, event signature, account, or transaction type — add a block index filter. Be aggressive about this. It is the biggest win available for your own development experience and your bill.
What you get:
blockFilter skips the blocks that can't match, so a contract active in
0.5% of blocks costs roughly 0.5% as much.The pattern (three pieces):
# 1. Index module — kind `blockIndex`, output `Keys`
- name: index_events
kind: blockIndex
inputs:
- map: all_events
output:
type: proto:sf.substreams.index.v1.Keys
# 2. Consuming module — declares blockFilter to skip non-matching blocks
- name: filtered_events
kind: map
blockFilter:
module: index_events
query:
params: true # SQE query from `params` (or: string: "<expr>")
inputs:
- params: string
- map: all_events
output:
type: proto:my.types.Events
The index handler is a map handler returning Keys (a repeated string
of labels you choose, e.g. evt_addr:0x..., evt_sig:0x..., program:<id>):
#[substreams::handlers::map]
fn index_events(events: Events) -> Result<Keys, Error> {
let mut keys = Keys::default();
for e in events.events {
if let Some(log) = e.log {
if let Some(t0) = log.topics.get(0) {
keys.keys.push(format!("evt_sig:0x{}", Hex::encode(t0)));
}
keys.keys.push(format!("evt_addr:0x{}", Hex::encode(&log.address)));
}
}
Ok(keys)
}
Query (SQE): boolean expression over keys — && (and), || (or), -
(not), ( ) grouping. A bare term must match a key exactly:
"evt_addr:0xA && -evt_addr:0xspam".
Critical rules:
kind: blockIndex with an #[substreams::handlers::map]
handler returning Keys.blockFilter gets blocks skipped — listing the index as a dependency is not
enough.evt_addr: vs
address:), and values are matched by literal equality — use 0x-prefixed
lowercase hex (EVM checksum/mixed-case addresses never match).filtered_*
modules already apply the blockFilter and return only matching records, so
depend on those directly (e.g. imports: { eth_common: [email protected] }
→ map: eth_common:filtered_events). You must override the default params
query (eth_common:filtered_events: "…"), or you silently emit the default's
data. In-handler filtering is only needed when you roll your own blockFilter,
or for Solana instruction-level filtering (transactions are pre-filtered, but
instructions within them are not).Full guide (SQE syntax, params vs string, use inheritance, foundational
indexes, Solana/EVM examples, decision flowchart):
see references/block-filtering.md.
When modules produce unexpected results:
substreams graph to visualize dependenciesinitialBlock is set appropriately.
initialBlock: 18000000.initialBlock. If you pin to protocol
genesis (e.g. 12369621 for Uniswap V3) but test at block 18000000, the sink
must backfill 5.6M blocks before producing output — impractical for local runs.blockFilter to skip irrelevant blocks entirely (biggest cost
lever) — see "Block & Transaction Filtering (Cost-Critical)" above and
references/block-filtering.md--production-modeSee references/manifest-spec.md for complete specification.
Package metadata:
specVersion: v0.1.0
package:
name: my-substreams
version: v1.0.3 # MUST have 'v' prefix — bare semver like "1.0.3" is rejected
description: Description of what this substreams does
versionrequires avprefix. Usev0.1.0, not0.1.0. The error message (version "0.1.0" should match Semver) is misleading — both forms are valid semver, but Substreams mandates thev-prefixed form. This applies to the top-levelpackage.versiononly;specVersionalready shows the correct prefix.
Protobuf imports:
protobuf:
files:
- events.proto
importPaths:
- ./proto
Binary reference (WASM code):
binaries:
default:
type: wasm/rust-v1
file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm
WASM Bindgen Shims (for solana_program, alloy, chrono, etc.):
Some Rust libraries create WebAssembly bindgen imports when compiled to wasm32-unknown-unknown. To use these libraries, enable the shims feature:
binaries:
default:
type: wasm/rust-v1+wasm-bindgen-shims
file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm
This allows compilation but the shims don't implement underlying functionality - avoid calling their special import functions at runtime. See WASM compatibility docs for details.
Network configuration:
network: mainnet
Supported networks: See references/networks.md
A complete Cargo.toml template for Ethereum Substreams with SQL sink support:
[package]
name = "my_substreams"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
# Core Substreams dependencies - VERSIONS MUST BE COMPATIBLE
# Check https://crates.io for latest versions
substreams = "0.7" # Latest: 0.7.3
substreams-ethereum = "0.11" # Latest: 0.11.1
# For SQL sink output (DatabaseChanges)
substreams-database-change = "4" # Latest: 4.0.0
# Protobuf serialization
prost = "0.13"
prost-types = "0.13" # Required for google.protobuf.Timestamp/Any in generated src/pb/ code
# Utility crates
hex = "0.4"
hex-literal = "0.4" # NOTE: hyphen, not underscore
num-bigint = "0.4"
# Required by generated ABI code (from build.rs)
ethabi = "18"
[build-dependencies]
substreams-ethereum = "0.11" # Latest: 0.11.1
[profile.release]
lto = true
opt-level = 's'
strip = "debuginfo"
Version Compatibility Matrix:
| substreams | substreams-ethereum | substreams-database-change | Notes |
|---|---|---|---|
| 0.7 | 0.11 | 4 | Current recommended versions |
| 0.6 | 0.10 | 3 | Legacy (incompatible with v4 database-change) |
Common Pitfalls:
hex_literal vs hex-literal: Rust crate names use hyphens, not underscoresethabi: Required by ABI-generated code but not always obviousprost-types: If substreams build generates src/pb/ code that
references google.protobuf.Timestamp or google.protobuf.Any, you will get
error[E0433]: failed to resolve: use of undeclared crate or module prost_types.
Ensure prost-types = "0.13" is in [dependencies] — the version must match
your prost major (currently 0.13 for the current substreams toolchain).
The template above includes it; do not remove it.rm -rf target && substreams buildWASM-Incompatible Crates:
Some crates enable wasm-bindgen features by default on wasm32 targets, causing runtime errors like:
unknown import: `__wbindgen_placeholder__::__wbindgen_describe` has not been defined
Solutions:
wasm/rust-v1+wasm-bindgen-shims in your manifest's binary type (see above)chrono = { version = "0.4", default-features = false }
Common crates requiring attention: chrono, solana_program, alloy, ethers-rs
use substreams::errors::Error;
use substreams::prelude::*;
use substreams_ethereum::pb::eth::v2::Block;
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
for trx in block.transactions() {
for (log, _call) in trx.logs_with_calls() {
// Process logs, extract events
if is_transfer_event(log) {
events.transfers.push(extract_transfer(log));
}
}
}
Ok(events)
}
#[substreams::handlers::store]
pub fn store_totals(events: Events, store: StoreAddInt64) {
for transfer in events.transfers {
store.add(0, &transfer.token, transfer.amount as i64);
}
}
Anti-pattern — DO NOT do this:
// ❌ WRONG: per-block HashMap cache. Re-fetched every block. Wastes RPC budget.
#[substreams::handlers::map]
pub fn map_swaps(block: Block) -> Result<Swaps, Error> {
let mut token_cache: HashMap<String, TokenMeta> = HashMap::new(); // ❌ scope = single block
for log in block.logs() {
let addr = log.address().to_string();
let meta = token_cache.entry(addr.clone())
.or_insert_with(|| fetch_token_metadata(&addr)); // ❌ fetched fresh next block
// ...
}
}
A HashMap declared inside a map handler is rebuilt every block. With ~50 unique pools per block × 2 tokens × 2 RPC calls (symbol + decimals), that's 200 RPCs per block — 20,000 over a 100-block run. The hosted Substreams runtime enforces RPC budgets; this pattern will fail at scale and waste quota at any scale.
Correct pattern — use a
set_if_not_existsstore:
Token metadata (symbol, decimals, name) is immutable per contract address. Cache once, read forever. The idiomatic chain:
map_token_addresses → store_token_metadata → map_swaps (reads from store)
The store_token_metadata handler runs the RPC only the first time each address is seen across the entire run — set_if_not_exists skips writes for keys already present. After that, map_swaps reads metadata from the store with zero RPC per block.
Yes, you can — and often should — call contracts from a map module.
substreams-ethereum::rpc::RpcBatch works in map handlers. The host runtime executes the batch synchronously before returning to your handler. There is no architectural restriction preventing RPCs in maps; this is a common misconception.
Copy-paste example — batch ERC20 metadata lookup:
use substreams_ethereum::rpc::RpcBatch;
// generated from build.rs / ABI codegen (or write by hand):
use crate::abi::erc20;
// Returns None on transient RPC failure or undecodable response — caller must skip + log.
// Never panic from a map/store handler: an unhandled panic aborts the whole substream.
fn fetch_token_metadata(token_addr: &[u8]) -> Option<(String, u32)> {
let batch = RpcBatch::new();
let responses = batch
.add(erc20::functions::Symbol {}, token_addr.to_vec())
.add(erc20::functions::Decimals {}, token_addr.to_vec())
.execute()
.ok()?; // transient RPC error → None, do not panic
let symbol = RpcBatch::decode::<_, erc20::functions::Symbol>(&responses.responses[0])?;
let decimals = RpcBatch::decode::<_, erc20::functions::Decimals>(&responses.responses[1])?
.to_u64() as u32; // never default to 18 — wrong for USDC/USDT (6), WBTC (8)
Some((symbol, decimals))
}
Never panic from a Substreams handler.
.expect()/.unwrap()on RPC results aborts the entire substream on any transient endpoint hiccup. ReturnOption/Result, then have the caller log + skip the record. Same for decimals: never silently default to 18 — emit nothing rather than wrong data.
substreams-ethereum-abigen (the build.rs codegen) emits substreams::scalar::BigInt for any uint* or int* field — including uint256, int256, and the wider uint8/uint32/etc. It does NOT emit ethabi::Uint or ethabi::Int.
// ❌ WRONG — older API; abigen does not emit ethabi types
fn format_amount(raw: ðabi::Uint, decimals: u32) -> String { /* ... */ }
// ✅ CORRECT — abigen emits BigInt
use substreams::scalar::BigInt;
fn format_amount(raw: &BigInt, decimals: u32) -> String {
raw.to_decimal(decimals as u64).to_string()
}
If you need a primitive integer (e.g. converting decimals() BigInt to u32):
// BigInt → primitive (assumes value fits — overflow is silent)
let decimals_u32: u32 = big_int_value.to_u64() as u32;
let amount_i64: i64 = big_int_value.to_i64();
For signed int256 values from event fields, use BigInt::to_decimal(decimals) for human-readable string, or the signum() + abs() methods to inspect sign.
.call() method takes one argument (the contract address)substreams-ethereum-abigen emits a .call(address) method on each function struct that performs the eth_call via the substreams host. It takes exactly ONE argument — the contract address. There is no second &block argument.
use crate::abi::erc20::functions;
// ❌ WRONG — older two-arg form (predates current substreams-ethereum)
let decimals = functions::Decimals::call(token_addr, &block);
// ✅ CORRECT — single-arg form, returns Option<T>; propagate None, never default
fn fetch_decimals(token_addr: &[u8]) -> Option<u32> {
functions::Decimals {}
.call(token_addr.to_vec())
.map(|d| d.to_u64() as u32)
}
// Caller: log + skip on None — do NOT default to 18.
let decimals = match fetch_decimals(&addr_bytes) {
Some(d) => d,
None => { substreams::log::warn!("decimals fetch failed for {:x?}", addr_bytes); continue; }
};
.call() returns Option<T> — None on RPC failure or decode failure. Always handle the None arm; never .unwrap() and never default decimals to 18 (silently wrong for USDC/USDT/WBTC). Skip the record or return None/Err to the caller.
For batched calls covering multiple eth_calls in one round-trip, prefer RpcBatch::new().add(...) (shown above).
substreams.yaml:
modules:
- name: map_token_addresses
kind: map
inputs:
- source: sf.ethereum.type.v2.Block
output:
type: proto:my.types.v1.TokenAddresses
- name: store_token_metadata
kind: store
updatePolicy: set_if_not_exists # ← write once, never overwrite
valueType: proto:my.types.v1.TokenMeta
inputs:
- map: map_token_addresses
- name: map_swaps
kind: map
inputs:
- source: sf.ethereum.type.v2.Block
- store: store_token_metadata
mode: get
output:
type: proto:my.types.v1.Swaps
store_token_metadata handler (RPC fires here, ONCE per address):
#[substreams::handlers::store]
pub fn store_token_metadata(
addrs: TokenAddresses,
store: StoreSetIfNotExistsProto<TokenMeta>,
) {
for addr_hex in &addrs.addresses {
let addr_bytes = match hex::decode(addr_hex.trim_start_matches("0x")) {
Ok(bytes) => bytes,
Err(_) => { substreams::log::warn!("invalid token address: {}", addr_hex); continue; }
};
let (symbol, decimals) = match fetch_token_metadata(&addr_bytes) {
Some(meta) => meta,
None => { substreams::log::warn!("token metadata fetch failed for {}", addr_hex); continue; }
};
store.set_if_not_exists(0, addr_hex, &TokenMeta { symbol, decimals });
}
}
map_swaps handler (zero RPC after first seen):
#[substreams::handlers::map]
pub fn map_swaps(
block: Block,
meta_store: StoreGetProto<TokenMeta>,
) -> Result<Swaps, substreams::errors::Error> {
let mut swaps = Swaps::default();
for pool_log in extract_swap_logs(&block) {
let meta = meta_store.get_last(&pool_log.token_address)
.unwrap_or_else(|| TokenMeta { symbol: "UNKNOWN".into(), decimals: 18 });
swaps.items.push(build_swap(pool_log, meta));
}
Ok(swaps)
}
Rule of thumb: if a value is immutable per contract address (symbol, decimals, factory deployment, pair tokens), use a set_if_not_exists store. If you wrote let mut cache: HashMap<...> = HashMap::new(); inside a map handler, you have a bug.
Common failure mode: agents use call traces or hardcode known tokens instead of batching
token0()/token1()eth_calls. Call traces are incomplete — they only appear when the pool is the callee, not for every swap. This silently producesUNKNOWNtokens for most pools.
V3 pools store token0 and token1 as immutable state. Resolve them via raw eth_call and cache in a store — same pattern as ERC20 metadata.
RpcBatch::add requires a generated ABI struct. For pool selectors, use eth_call with raw RpcCalls directly — no ABI codegen needed:
// Uniswap V3 pool: token0() → address, token1() → address
// selector = keccak256("token0()")[0..4] = 0x0dfe1681
// selector = keccak256("token1()")[0..4] = 0xd21220a7
use substreams_ethereum::pb::eth::rpc::{RpcCall, RpcCalls};
use substreams_ethereum::rpc::eth_call;
fn decode_address_return(raw: &[u8]) -> Option<Vec<u8>> {
// ABI: address is padded to 32 bytes, actual address is last 20
if raw.len() < 32 { return None; }
Some(raw[12..32].to_vec()) // skip 12 bytes of zero-padding
}
fn fetch_pool_tokens(pool_addr: &[u8]) -> Option<(Vec<u8>, Vec<u8>)> {
let calls = RpcCalls {
calls: vec![
RpcCall { to_addr: pool_addr.to_vec(), data: vec![0x0d, 0xfe, 0x16, 0x81] }, // token0()
RpcCall { to_addr: pool_addr.to_vec(), data: vec![0xd2, 0x12, 0x20, 0xa7] }, // token1()
],
};
let responses = eth_call(&calls);
if responses.responses.len() < 2 { return None; }
let token0 = decode_address_return(&responses.responses[0].raw)?;
let token1 = decode_address_return(&responses.responses[1].raw)?;
Some((token0, token1))
}
Note:
RpcBatch::add_call()does NOT exist insubstreams-ethereum v0.11. Useeth_call(&RpcCalls { calls: [...] })for raw calldata, orRpcBatch::add(AbiStruct {}, addr)when you have generated ABI structs.
Wire it into a store (exactly like ERC20 metadata):
map_v3_pools (emits pool addresses) → store_pool_tokens (set_if_not_exists) → map_v3_swaps (reads token0/token1 from store)
// In map_v3_pools: emit any new pool addresses seen this block
// In store handler:
#[substreams::handlers::store]
pub fn store_pool_tokens(pools: PoolAddresses, store: StoreSetIfNotExistsProto<TokenPair>) {
for pool_hex in &pools.addresses {
let pool_bytes = match hex::decode(pool_hex.trim_start_matches("0x")) {
Ok(bytes) => bytes,
Err(_) => { substreams::log::warn!("invalid pool address: {}", pool_hex); continue; }
};
if let Some((t0, t1)) = fetch_pool_tokens(&pool_bytes) {
store.set_if_not_exists(0, pool_hex, &TokenPair {
token0: format!("0x{}", hex::encode(&t0)),
token1: format!("0x{}", hex::encode(&t1)),
});
}
}
}
// In map_v3_swaps: read from store (zero RPC)
let pair = pool_store.get_last(&pool_hex)
.unwrap_or_default(); // default = empty strings if pool not yet seen
Never use call traces for token resolution. block.calls() only has entries when a call to the pool was the top-level transaction call or an explicit internal call — it misses pools that emitted Swap via pure EVM event emission without a visible call trace.
Result<T, Error> returnssubstreams-ethereum cratesubstreams run before deployingCRITICAL: One of the greatest performance impacts in Substreams is excessive cloning of data structures.
Cloning large data structures is expensive:
Use Rust's ownership system to transfer or borrow data instead of cloning.
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
for trx in block.transactions() {
// ❌ BAD: Cloning entire transaction
let transaction = trx.clone();
for (log, _call) in transaction.logs_with_calls() {
// ❌ BAD: Cloning log
let log_copy = log.clone();
if is_transfer_event(&log_copy) {
events.transfers.push(extract_transfer(&log_copy));
}
}
}
Ok(events)
}
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
// ✅ GOOD: Iterate by reference
for trx in block.transactions() {
// ✅ GOOD: Borrow, don't clone
for (log, _call) in trx.logs_with_calls() {
if is_transfer_event(log) {
// ✅ GOOD: Only extract what you need
events.transfers.push(extract_transfer(log));
}
}
}
Ok(events)
}
fn is_transfer_event(log: &Log) -> bool {
// Use reference, no cloning
!log.topics.is_empty() &&
log.topics[0] == TRANSFER_EVENT_SIGNATURE
}
fn extract_transfer(log: &Log) -> Transfer {
// Extract only the fields you need
Transfer {
from: Hex::encode(&log.topics[1]),
to: Hex::encode(&log.topics[2]),
amount: Hex::encode(&log.data),
// Don't copy the entire log
}
}
Clone only small, necessary data:
// ✅ OK: Cloning small strings
let token_address = Hex::encode(&log.address).clone();
// ✅ OK: Cloning primitive types
let block_number = block.number.clone();
// ❌ BAD: Cloning entire structures
let block_copy = block.clone(); // Never do this!
let trx_copy = transaction.clone(); // Avoid this!
Use logs_with_calls(): Iterate logs without cloning
for (log, _call) in trx.logs_with_calls() { } // Good
for log in trx.receipt.as_ref().unwrap().logs.clone() { } // Bad
Use references when appropriate: Pass references to avoid unnecessary cloning
fn process_log(log: &Log) { } // Good for read-only access
fn process_log(log: Log) { } // Good when consuming/transforming data
Extract minimal data: Only copy what you actually need
// Good: Extract only needed fields
let amount = parse_amount(&log.data);
// Bad: Copy entire log just to get one field
let log_copy = log.clone();
let amount = parse_amount(&log_copy.data);
Use into() for consumption: When you need to consume data
// When you truly need to take ownership
events.transfers.push(Transfer {
from: topics[1].into(), // Consumes the data
to: topics[2].into(),
});
Pitfall #1: Cloning in filters
// ❌ BAD
block.transactions()
.iter()
.filter(|trx| trx.clone().to == target) // Clone every transaction!
// ✅ GOOD
block.transactions()
.iter()
.filter(|trx| trx.to == target) // Just compare
Pitfall #2: Unnecessary defensive copies
// ❌ BAD
let block_copy = block.clone();
for trx in block_copy.transactions() { } // Why clone the whole block?
// ✅ GOOD
for trx in block.transactions() { } // Use the block directly
Pitfall #3: Cloning for mutation
// ❌ BAD
let mut trx_copy = trx.clone();
trx_copy.value = process(trx_copy.value); // Clone just to mutate
// ✅ GOOD
let new_value = process(&trx.value); // Process reference, create new value
Use substreams run with timing to measure performance:
# Before refactor (with clones)
time substreams run -s 17000000 -t +1000 map_events
# After refactor (clones removed) — re-run same command
time substreams run -s 17000000 -t +1000 map_events
# You should see significant speedup (2-10x) by avoiding clones
substreams run to identify bottlenecksSee references/patterns.md for detailed examples:
To get the current head block of a chain (useful for determining the latest block number):
Using Substreams:
# Quick head block lookup for a network
substreams run common@latest -s -1 --network mainnet
# Or with explicit endpoint
substreams run common@latest -e=<network-id-alias-or-host> -s -1 -o jsonl
Read the first line of output to get the head block information. The -s -1 flag starts from the latest block.
Using firecore:
# JSON output (use jq for further processing if available)
firecore tools firehose-client <network-id-alias-or-host> -o json -- -1
# Text output (less detail), first line looks like:
# Block #24327807 (14b58bd3fa091c05a46d084bba1e78090d52556d29f4312da77b7aa3220423f4)
firecore tools firehose-client <network-id-alias-or-host> -o text -- -1
Read the first line of output to get the head block information.
substreams gui for visual debugging (when available).spkg files for reproducibilitydoc: fields in manifest for clarityBuild fails:
version "x.y.z" should match Semver: Add a v prefix to package.version
in substreams.yaml — use v0.1.0, not 0.1.0.rustup target add wasm32-unknown-unknownbuf CLI is installed (required for proto generation)protobuf.excludePaths with sf/substreams and google when importing spkgsLinking errors ("symbol multiply defined" or "failed to load bitcode"):
This typically indicates version mismatches between Substreams crates. Solutions:
rm -rf target && substreams buildsubstreams 0.6 + substreams-database-change 4 (requires substreams 0.7)Missing method errors on ABI-generated types:
If you see errors like "no method named decode found":
use substreams_ethereum::Event; importethabi = "18" is in your Cargo.toml dependenciesspkg import 404 errors:
substreams-ethereum spkg, NOT sf-ethereum (doesn't exist)Empty output:
initialBlock is before first relevant blockPerformance issues:
--production-mode for large rangesAlso load
substreams-sinkskill when building agraph_outmodule. It contains the full working example and the EntityChanges proto definition.
Key facts to avoid the most common mistake:
| You want to write to | Output proto | Package |
|---|---|---|
| The Graph / subgraph | EntityChanges | sf.substreams.sink.entity.v1 |
| Postgres / SQL | DatabaseChanges | sf.substreams.sink.database.v1 |
These are NOT interchangeable. Using DatabaseChanges in a graph_out module (or vice versa) compiles but produces a pipeline that fails or emits garbage.
Do NOT add substreams-entity-change = "1" to Cargo.toml — v1 has a prost version conflict with the current toolchain (prost 0.13). Check crates.io to see if a newer version resolves this before inlining the proto. Instead, inline the proto:
proto/entity.proto (exact package name required):
syntax = "proto3";
package sf.substreams.sink.entity.v1;
message EntityChanges {
repeated EntityChange entity_changes = 1;
}
message EntityChange {
enum Operation { UNSET=0; CREATE=1; UPDATE=2; DELETE=3; FINAL=4; }
string entity = 1;
string id = 2;
uint64 ordinal = 3;
Operation operation = 4;
repeated Field fields = 5;
}
message Value {
oneof typed {
int32 int32 = 1;
string bigdecimal = 2;
string bigint = 3;
string string = 4;
bytes bytes = 5;
bool bool = 6;
Array array = 10;
}
}
message Array {
repeated Value value = 1;
}
message Field {
string name = 1;
Value old_value = 2;
Value new_value = 3;
}
Wire compatibility: Copy this proto verbatim from the canonical source. The package name, message names, field numbers, and field types must all match exactly — simplifying any type (e.g.
stringfornew_value) will produce empty/garbage values in Graph Node.
substreams.yaml output type:
output:
type: proto:sf.substreams.sink.entity.v1.EntityChanges
Rust import (after proto is compiled via build.rs):
use crate::pb::sf::substreams::sink::entity::v1::{EntityChange, EntityChanges, Field};
use crate::pb::sf::substreams::sink::entity::v1::entity_change::Operation;
Solana uses a different block model, instruction paradigm, and account system than EVM chains. Do not apply Ethereum patterns here.
For all Solana development — block iteration, walk_instructions() vs message.instructions, SPL Token parsing, Anchor discriminators, b58!, Cargo.toml + manifest setup — see references/solana.md.
npx claudepluginhub streamingfast/substreams-skills --plugin substreams-devProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.