Skill

substreams-dev

Expert knowledge for developing, building, and debugging Substreams projects on any blockchain. Use when working with substreams.yaml manifests, Rust modules, protobuf schemas, or blockchain data processing.

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/substreams:substreams-dev

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation.

Supporting Files

references/block-filtering.mdreferences/manifest-spec.mdreferences/module-types.mdreferences/networks.mdreferences/patterns.mdreferences/solana.md

SKILL.md

1198 lines · ~11.1k tokens(exceeds 5k compaction limit)

Stats

LanguageJavaScript

Stars3

Forks4

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Substreams Development Expert

Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation.

Core Concepts

What is Substreams?

Substreams is a powerful blockchain indexing technology that enables:

Parallel processing of blockchain data with high performance
Composable modules written in Rust (map, store, index types)
Protobuf schemas for typed data structures
Streaming-first architecture with cursor-based reorg handling

Key Components

Manifest (substreams.yaml): Defines modules, networks, dependencies
Modules: Map (transform), Store (aggregate), Index (filter)
Protobuf: Type-safe schemas for inputs and outputs
WASM: Rust code compiled to WebAssembly for execution

Project Structure

my-substreams/
├── substreams.yaml              # Manifest (manual)
├── README.md                    # Package documentation for substreams.dev registry (manual)
├── schema.sql                   # SQL schema for sinks (manual)
├── Cargo.toml                   # Rust dependencies (manual)
├── build.rs                     # ABI code generation (manual, optional)
├── abi/                         # Contract ABI JSON files (manual)
│   └── my_contract.json         # ABI for code generation
├── proto/
│   └── events.proto             # Schema definitions (manual)
├── src/
│   ├── lib.rs                   # Rust module code (manual)
│   ├── abi/                     # Generated ABI bindings (from build.rs)
│   │   ├── mod.rs               # Exports generated modules (manual)
│   │   └── my_contract.rs       # Generated by build.rs (auto)
│   └── pb/                      # Generated protobuf code (auto - DO NOT CREATE)
└── target/                      # Build output (gitignored)

Important: The src/pb/ directory is entirely auto-generated by substreams build. Never create it manually.

Key Directories

abi/ - Contains JSON ABI files for smart contracts. These are used by build.rs to generate Rust bindings.

src/abi/ - Generated Rust code from ABIs. Create src/abi/mod.rs to export the generated modules:

// src/abi/mod.rs
pub mod my_contract;  // Matches the name in build.rs

src/pb/ - Generated protobuf code. This directory and its contents are auto-generated - do NOT create manually. Generate with:

substreams protogen  # Generate proto bindings only (fast, useful for iterative development)
substreams build     # Full build (includes protogen + cargo build)

Use substreams protogen for iterative development - it generates the Rust bindings quickly so you get type hints and autocomplete while writing module code, without waiting for the full WASM compilation.

build.rs for ABI Generation

// build.rs
fn main() {
    substreams_ethereum::Abigen::new("MyContract", "abi/my_contract.json")
        .expect("Failed to load ABI")
        .generate()
        .expect("Failed to generate bindings")
        .write_to_file("src/abi/my_contract.rs")
        .expect("Failed to write bindings");
}

Prerequisites

Required CLI Tools

substreams: Core CLI for building, running, and deploying
buf: Required by substreams build for protobuf code generation

Authentication

Running substreams run against hosted endpoints requires authentication. Get your API key from The Graph Market - sign up at thegraph.market/auth/signup.

CLI Authentication (Recommended):

substreams auth  # Interactive authentication, stores token locally

Quick Token Generation: Visit thegraph.market/auth/substreams-devenv to generate a JWT token from your API key directly in the browser.

Environment Variables (Alternative):

export SUBSTREAMS_API_KEY="your-api-key"
# Or set bearer token directly
export SUBSTREAMS_API_TOKEN="your-jwt-token"

The substreams auth command handles token exchange and local storage automatically, making it the easiest way to get started.

Common Workflows

Pre-flight: Clarifying Under-Specified Requests

Before writing any code, check whether the request provides all of the following. If one or more items are missing or ambiguous, ask the user ONCE with a consolidated list — do not make silent assumptions and do not write code until you have the answers.

Required input	Why it matters
Target chain	Block type, RPC endpoints, and ABI tooling differ per chain
Contract address(es) or protocol	Determines which events/calls to decode
Data you want to capture	Events only? Calls? State changes? Aggregations?
Output / sink type	`substreams run`, SQL sink, graph-out, custom sink?
Block range or time window	`initialBlock` and test range; performance implications
Thresholds or filters	Min value, token allowlist, address filter, etc.
Block sparsity	Does the target appear in only some blocks? If so, a block index filter is a near-mandatory cost optimization — see "Block & Transaction Filtering"

If any item is unknown, respond with something like:

Before I build this, I need a few details:

Which chain? (Ethereum mainnet, Polygon, Arbitrum, ...)

Which contract(s) or protocol?

What specific events or data fields do you need?

Where should the output go? (Postgres, The Graph, just substreams run?)

What start block or date range?

Any filters — minimum transfer size, specific token list, etc.?

Only ask once. If you receive partial answers, proceed with what you have and state your remaining assumptions explicitly in your response.

If the prompt is concrete and complete, skip the checklist and build immediately.

Creating a New Project

Migrating an existing project? Load the substreams-convert skill if you are porting a subgraph or Solana program/contract to Substreams instead of starting from scratch.

Initialize: Use substreams init or create manifest manually
Define schema: Create .proto files for your data structures
Implement modules: Write Rust handlers in src/lib.rs
Build: Run substreams build to compile to .spkg
Test: Run substreams run with small block range (recommended: 1000 blocks)
Document: Create README.md for the substreams.dev registry (see "README for substreams.dev Registry" below)
Deploy: Publish to registry or deploy as service

README for substreams.dev Registry

Every Substreams package published to the registry must include a README.md. This file is the primary documentation shown on substreams.dev and is the first thing consumers see.

Required sections:

# <Package Title>

<One-sentence description of what this package indexes and outputs.>

## Overview

<2-3 sentences: what data it captures, what protocol/chain, and intended use case.>

## Modules

| Module | Kind | Output Type | Description |
|--------|------|-------------|-------------|
| `map_events` | map | `proto:my.types.v1.Events` | Extracts transfer events from each block |
| `store_totals` | store | `int64` | Accumulates running totals per token |

## Prerequisites

- [`substreams` CLI](https://substreams.streamingfast.io/documentation/consume/installing-the-cli) installed
- Authenticated: `substreams auth`

## Quick Start

```bash
substreams run -e mainnet.eth.streamingfast.io \
  substreams.yaml map_events \
  -s 18000000 -t +1000

References


**Rules:**
- Title matches `package.name` in `substreams.yaml`
- Module table lists every `name:` entry from the manifest — consumers need this to know what to `substreams run`
- Quick Start uses a real block range, not a placeholder
- Do NOT include a "Contributing" or "License" section — the registry pulls license from the manifest

### Module Types

**Map Module** - Transforms input to output
```yaml
- name: map_events
  kind: map
  inputs:
    - source: sf.ethereum.type.v2.Block
  output:
    type: proto:my.types.Events

Store Module - Aggregates data across blocks

- name: store_totals
  kind: store
  updatePolicy: add
  valueType: int64
  inputs:
    - map: map_events

Index Module (kind: blockIndex) - Emits per-block Keys so downstream modules can skip blocks. The handler is #[substreams::handlers::map] returning Keys.

- name: index_transfers
  kind: blockIndex
  inputs:
    - map: map_events
  output:
    type: proto:sf.substreams.index.v1.Keys

# A consuming module skips non-matching blocks via `blockFilter`:
- name: filtered_transfers
  kind: map
  blockFilter:
    module: index_transfers   # references the blockIndex module above
    query:
      string: "token:0xdac17f958d2ee523a2206206994597c13d831ec7"
  inputs:
    - map: map_events
  output:
    type: proto:my.types.Transfers

The index alone does nothing — only a module with an explicit blockFilter gets blocks skipped. See "Block & Transaction Filtering (Cost-Critical)" below.

initialBlock guidance:
modules:
  - name: map_events
    kind: map
    initialBlock: 18000000   # ✅ start of your test/data range
    # NOT: 12369621          # ❌ protocol genesis — forces full backfill on every run
    inputs:
      - source: sf.ethereum.type.v2.Block
    output:
      type: proto:my.types.Events
Pin initialBlock to the first block your downstream consumer actually needs. The runtime starts processing from max(--start-block, initialBlock), then walks forward. Stores must catch up from initialBlock on each cold start — a deep genesis pin turns a 100-block test into a multi-hour backfill.

For reference: the T3.2 golden uses initialBlock: 17999900 to cover a -s 18000000 -t +100 acceptance window.

Block & Transaction Filtering (Cost-Critical)

Whenever a Substreams targets only a subset of blocks — a specific contract, program, event signature, account, or transaction type — add a block index filter. Be aggressive about this. It is the biggest win available for your own development experience and your bill.

What you get:

Lower cost — you're billed for the blocks the engine actually processes. A blockFilter skips the blocks that can't match, so a contract active in 0.5% of blocks costs roughly 0.5% as much.
Much faster runs — backfills and historical syncs that would take hours finish in minutes, because the engine jumps straight past irrelevant blocks.
Tighter iteration — quicker rebuild/run cycles while developing, and a GUI progress bar that races across skipped ranges instead of crawling.
Less noise downstream — your sink ingests only relevant blocks, so databases stay smaller and reorg handling has less to reconcile.

The pattern (three pieces):

# 1. Index module — kind `blockIndex`, output `Keys`
- name: index_events
  kind: blockIndex
  inputs:
    - map: all_events
  output:
    type: proto:sf.substreams.index.v1.Keys

# 2. Consuming module — declares blockFilter to skip non-matching blocks
- name: filtered_events
  kind: map
  blockFilter:
    module: index_events
    query:
      params: true          # SQE query from `params` (or: string: "<expr>")
  inputs:
    - params: string
    - map: all_events
  output:
    type: proto:my.types.Events

The index handler is a map handler returning Keys (a repeated string of labels you choose, e.g. evt_addr:0x..., evt_sig:0x..., program:<id>):

#[substreams::handlers::map]
fn index_events(events: Events) -> Result<Keys, Error> {
    let mut keys = Keys::default();
    for e in events.events {
        if let Some(log) = e.log {
            if let Some(t0) = log.topics.get(0) {
                keys.keys.push(format!("evt_sig:0x{}", Hex::encode(t0)));
            }
            keys.keys.push(format!("evt_addr:0x{}", Hex::encode(&log.address)));
        }
    }
    Ok(keys)
}

Query (SQE): boolean expression over keys — && (and), || (or), - (not), ( ) grouping. A bare term must match a key exactly: "evt_addr:0xA && -evt_addr:0xspam".

Critical rules:

The index module is kind: blockIndex with an #[substreams::handlers::map] handler returning Keys.
The index does nothing automatically. Only a module with an explicit blockFilter gets blocks skipped — listing the index as a dependency is not enough.
Query namespace must match the emitted key prefix exactly (evt_addr: vs address:), and values are matched by literal equality — use 0x-prefixed lowercase hex (EVM checksum/mixed-case addresses never match).
Don't reinvent. Most chains ship a foundational package whose filtered_* modules already apply the blockFilter and return only matching records, so depend on those directly (e.g. imports: { eth_common: [email protected] } → map: eth_common:filtered_events). You must override the default params query (eth_common:filtered_events: "…"), or you silently emit the default's data. In-handler filtering is only needed when you roll your own blockFilter, or for Solana instruction-level filtering (transactions are pre-filtered, but instructions within them are not).

Full guide (SQE syntax, params vs string, use inheritance, foundational indexes, Solana/EVM examples, decision flowchart): see references/block-filtering.md.

Debugging Checklist

When modules produce unexpected results:

Validate manifest: substreams graph to visualize dependencies
Test small range: Run 100-1000 blocks, inspect outputs carefully
Check logs: Look for WASM panics, protobuf decode errors
Verify schema: Ensure proto types match expected data
Review inputs: Confirm input modules produce correct data
Initial block: Check initialBlock is set appropriately.
- Set it to the earliest block your test range starts from, not the protocol's first block. If you only need data from block 18000000, use initialBlock: 18000000.
- The runtime seeds stores forward from initialBlock. If you pin to protocol genesis (e.g. 12369621 for Uniswap V3) but test at block 18000000, the sink must backfill 5.6M blocks before producing output — impractical for local runs.
- For production: pin to the earliest block your downstream consumer cares about.
- For one-off testing: pin to a narrow window covering your test range.

Performance Optimization

Add a blockFilter to skip irrelevant blocks entirely (biggest cost lever) — see "Block & Transaction Filtering (Cost-Critical)" above and references/block-filtering.md
Minimize store size by storing only necessary data
Production mode enables parallel execution: --production-mode
Module granularity: Smaller, focused modules perform better
Avoid deep nesting: Flatten module dependencies when possible

Manifest Reference

See references/manifest-spec.md for complete specification.

Key Sections

Package metadata:

specVersion: v0.1.0
package:
  name: my-substreams
  version: v1.0.3   # MUST have 'v' prefix — bare semver like "1.0.3" is rejected
  description: Description of what this substreams does

version requires a v prefix. Use v0.1.0, not 0.1.0. The error message (version "0.1.0" should match Semver) is misleading — both forms are valid semver, but Substreams mandates the v-prefixed form. This applies to the top-level package.version only; specVersion already shows the correct prefix.

Protobuf imports:

protobuf:
  files:
    - events.proto
  importPaths:
    - ./proto

Binary reference (WASM code):

binaries:
  default:
    type: wasm/rust-v1
    file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm

WASM Bindgen Shims (for solana_program, alloy, chrono, etc.):

Some Rust libraries create WebAssembly bindgen imports when compiled to wasm32-unknown-unknown. To use these libraries, enable the shims feature:

binaries:
  default:
    type: wasm/rust-v1+wasm-bindgen-shims
    file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm

This allows compilation but the shims don't implement underlying functionality - avoid calling their special import functions at runtime. See WASM compatibility docs for details.

Network configuration:

network: mainnet

Supported networks: See references/networks.md

Rust Module Development

Cargo.toml Setup

A complete Cargo.toml template for Ethereum Substreams with SQL sink support:

[package]
name = "my_substreams"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
# Core Substreams dependencies - VERSIONS MUST BE COMPATIBLE
# Check https://crates.io for latest versions
substreams = "0.7"              # Latest: 0.7.3
substreams-ethereum = "0.11"    # Latest: 0.11.1

# For SQL sink output (DatabaseChanges)
substreams-database-change = "4"  # Latest: 4.0.0

# Protobuf serialization
prost = "0.13"
prost-types = "0.13"  # Required for google.protobuf.Timestamp/Any in generated src/pb/ code

# Utility crates
hex = "0.4"
hex-literal = "0.4"      # NOTE: hyphen, not underscore
num-bigint = "0.4"

# Required by generated ABI code (from build.rs)
ethabi = "18"

[build-dependencies]
substreams-ethereum = "0.11"    # Latest: 0.11.1

[profile.release]
lto = true
opt-level = 's'
strip = "debuginfo"

Version Compatibility Matrix:

substreams	substreams-ethereum	substreams-database-change	Notes
0.7	0.11	4	Current recommended versions
0.6	0.10	3	Legacy (incompatible with v4 database-change)

Common Pitfalls:

hex_literal vs hex-literal: Rust crate names use hyphens, not underscores
Missing ethabi: Required by ABI-generated code but not always obvious
Missing prost-types: If substreams build generates src/pb/ code that references google.protobuf.Timestamp or google.protobuf.Any, you will get error[E0433]: failed to resolve: use of undeclared crate or module prost_types. Ensure prost-types = "0.13" is in [dependencies] — the version must match your prost major (currently 0.13 for the current substreams toolchain). The template above includes it; do not remove it.
Version mismatch: Mixing 0.6/0.7 substreams versions causes linking errors
If you get "symbol multiply defined" errors, run rm -rf target && substreams build

WASM-Incompatible Crates:

Some crates enable wasm-bindgen features by default on wasm32 targets, causing runtime errors like:

unknown import: `__wbindgen_placeholder__::__wbindgen_describe` has not been defined

Solutions:

Use wasm/rust-v1+wasm-bindgen-shims in your manifest's binary type (see above)

Disable default features for problematic crates:

chrono = { version = "0.4", default-features = false }

Common crates requiring attention: chrono, solana_program, alloy, ethers-rs

Map Handler Example

use substreams::errors::Error;
use substreams::prelude::*;
use substreams_ethereum::pb::eth::v2::Block;

#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
    let mut events = Events::default();

    for trx in block.transactions() {
        for (log, _call) in trx.logs_with_calls() {
            // Process logs, extract events
            if is_transfer_event(log) {
                events.transfers.push(extract_transfer(log));
            }
        }
    }

    Ok(events)
}

Store Handler Example

#[substreams::handlers::store]
pub fn store_totals(events: Events, store: StoreAddInt64) {
    for transfer in events.transfers {
        store.add(0, &transfer.token, transfer.amount as i64);
    }
}

Token Metadata: ALWAYS Use a Store, NEVER a HashMap

Anti-pattern — DO NOT do this:

// ❌ WRONG: per-block HashMap cache. Re-fetched every block. Wastes RPC budget.
#[substreams::handlers::map]
pub fn map_swaps(block: Block) -> Result<Swaps, Error> {
    let mut token_cache: HashMap<String, TokenMeta> = HashMap::new();  // ❌ scope = single block
    for log in block.logs() {
        let addr = log.address().to_string();
        let meta = token_cache.entry(addr.clone())
            .or_insert_with(|| fetch_token_metadata(&addr));  // ❌ fetched fresh next block
        // ...
    }
}

A HashMap declared inside a map handler is rebuilt every block. With ~50 unique pools per block × 2 tokens × 2 RPC calls (symbol + decimals), that's 200 RPCs per block — 20,000 over a 100-block run. The hosted Substreams runtime enforces RPC budgets; this pattern will fail at scale and waste quota at any scale.

Correct pattern — use a set_if_not_exists store:

Token metadata (symbol, decimals, name) is immutable per contract address. Cache once, read forever. The idiomatic chain:

map_token_addresses  →  store_token_metadata  →  map_swaps (reads from store)

The store_token_metadata handler runs the RPC only the first time each address is seen across the entire run — set_if_not_exists skips writes for keys already present. After that, map_swaps reads metadata from the store with zero RPC per block.

Calling Contracts from a Map Handler (eth_call)

Yes, you can — and often should — call contracts from a map module.

substreams-ethereum::rpc::RpcBatch works in map handlers. The host runtime executes the batch synchronously before returning to your handler. There is no architectural restriction preventing RPCs in maps; this is a common misconception.

Copy-paste example — batch ERC20 metadata lookup:

use substreams_ethereum::rpc::RpcBatch;
// generated from build.rs / ABI codegen (or write by hand):
use crate::abi::erc20;

// Returns None on transient RPC failure or undecodable response — caller must skip + log.
// Never panic from a map/store handler: an unhandled panic aborts the whole substream.
fn fetch_token_metadata(token_addr: &[u8]) -> Option<(String, u32)> {
    let batch = RpcBatch::new();
    let responses = batch
        .add(erc20::functions::Symbol {}, token_addr.to_vec())
        .add(erc20::functions::Decimals {}, token_addr.to_vec())
        .execute()
        .ok()?;  // transient RPC error → None, do not panic

    let symbol   = RpcBatch::decode::<_, erc20::functions::Symbol>(&responses.responses[0])?;
    let decimals = RpcBatch::decode::<_, erc20::functions::Decimals>(&responses.responses[1])?
        .to_u64() as u32;  // never default to 18 — wrong for USDC/USDT (6), WBTC (8)
    Some((symbol, decimals))
}

Never panic from a Substreams handler. .expect() / .unwrap() on RPC results aborts the entire substream on any transient endpoint hiccup. Return Option/Result, then have the caller log + skip the record. Same for decimals: never silently default to 18 — emit nothing rather than wrong data.

ABI codegen output types: BigInt, not ethabi::Uint

substreams-ethereum-abigen (the build.rs codegen) emits substreams::scalar::BigInt for any uint* or int* field — including uint256, int256, and the wider uint8/uint32/etc. It does NOT emit ethabi::Uint or ethabi::Int.

// ❌ WRONG — older API; abigen does not emit ethabi types
fn format_amount(raw: &ethabi::Uint, decimals: u32) -> String { /* ... */ }

// ✅ CORRECT — abigen emits BigInt
use substreams::scalar::BigInt;
fn format_amount(raw: &BigInt, decimals: u32) -> String {
    raw.to_decimal(decimals as u64).to_string()
}

If you need a primitive integer (e.g. converting decimals() BigInt to u32):

// BigInt → primitive (assumes value fits — overflow is silent)
let decimals_u32: u32 = big_int_value.to_u64() as u32;
let amount_i64:  i64 = big_int_value.to_i64();

For signed int256 values from event fields, use BigInt::to_decimal(decimals) for human-readable string, or the signum() + abs() methods to inspect sign.

Generated `.call()` method takes one argument (the contract address)

substreams-ethereum-abigen emits a .call(address) method on each function struct that performs the eth_call via the substreams host. It takes exactly ONE argument — the contract address. There is no second &block argument.

use crate::abi::erc20::functions;

// ❌ WRONG — older two-arg form (predates current substreams-ethereum)
let decimals = functions::Decimals::call(token_addr, &block);

// ✅ CORRECT — single-arg form, returns Option<T>; propagate None, never default
fn fetch_decimals(token_addr: &[u8]) -> Option<u32> {
    functions::Decimals {}
        .call(token_addr.to_vec())
        .map(|d| d.to_u64() as u32)
}

// Caller: log + skip on None — do NOT default to 18.
let decimals = match fetch_decimals(&addr_bytes) {
    Some(d) => d,
    None => { substreams::log::warn!("decimals fetch failed for {:x?}", addr_bytes); continue; }
};

.call() returns Option<T> — None on RPC failure or decode failure. Always handle the None arm; never .unwrap() and never default decimals to 18 (silently wrong for USDC/USDT/WBTC). Skip the record or return None/Err to the caller.

For batched calls covering multiple eth_calls in one round-trip, prefer RpcBatch::new().add(...) (shown above).

Full module graph: cache once, read forever

substreams.yaml:

modules:
  - name: map_token_addresses
    kind: map
    inputs:
      - source: sf.ethereum.type.v2.Block
    output:
      type: proto:my.types.v1.TokenAddresses

  - name: store_token_metadata
    kind: store
    updatePolicy: set_if_not_exists   # ← write once, never overwrite
    valueType: proto:my.types.v1.TokenMeta
    inputs:
      - map: map_token_addresses

  - name: map_swaps
    kind: map
    inputs:
      - source: sf.ethereum.type.v2.Block
      - store: store_token_metadata
        mode: get
    output:
      type: proto:my.types.v1.Swaps

store_token_metadata handler (RPC fires here, ONCE per address):

#[substreams::handlers::store]
pub fn store_token_metadata(
    addrs: TokenAddresses,
    store: StoreSetIfNotExistsProto<TokenMeta>,
) {
    for addr_hex in &addrs.addresses {
        let addr_bytes = match hex::decode(addr_hex.trim_start_matches("0x")) {
            Ok(bytes) => bytes,
            Err(_) => { substreams::log::warn!("invalid token address: {}", addr_hex); continue; }
        };
        let (symbol, decimals) = match fetch_token_metadata(&addr_bytes) {
            Some(meta) => meta,
            None => { substreams::log::warn!("token metadata fetch failed for {}", addr_hex); continue; }
        };
        store.set_if_not_exists(0, addr_hex, &TokenMeta { symbol, decimals });
    }
}

map_swaps handler (zero RPC after first seen):

#[substreams::handlers::map]
pub fn map_swaps(
    block: Block,
    meta_store: StoreGetProto<TokenMeta>,
) -> Result<Swaps, substreams::errors::Error> {
    let mut swaps = Swaps::default();
    for pool_log in extract_swap_logs(&block) {
        let meta = meta_store.get_last(&pool_log.token_address)
            .unwrap_or_else(|| TokenMeta { symbol: "UNKNOWN".into(), decimals: 18 });
        swaps.items.push(build_swap(pool_log, meta));
    }
    Ok(swaps)
}

Rule of thumb: if a value is immutable per contract address (symbol, decimals, factory deployment, pair tokens), use a set_if_not_exists store. If you wrote let mut cache: HashMap<...> = HashMap::new(); inside a map handler, you have a bug.

Uniswap V3 Pool Token Resolution (F35)

Common failure mode: agents use call traces or hardcode known tokens instead of batching token0()/token1() eth_calls. Call traces are incomplete — they only appear when the pool is the callee, not for every swap. This silently produces UNKNOWN tokens for most pools.

V3 pools store token0 and token1 as immutable state. Resolve them via raw eth_call and cache in a store — same pattern as ERC20 metadata.

RpcBatch::add requires a generated ABI struct. For pool selectors, use eth_call with raw RpcCalls directly — no ABI codegen needed:

// Uniswap V3 pool: token0() → address, token1() → address
// selector = keccak256("token0()")[0..4] = 0x0dfe1681
// selector = keccak256("token1()")[0..4] = 0xd21220a7

use substreams_ethereum::pb::eth::rpc::{RpcCall, RpcCalls};
use substreams_ethereum::rpc::eth_call;

fn decode_address_return(raw: &[u8]) -> Option<Vec<u8>> {
    // ABI: address is padded to 32 bytes, actual address is last 20
    if raw.len() < 32 { return None; }
    Some(raw[12..32].to_vec())   // skip 12 bytes of zero-padding
}

fn fetch_pool_tokens(pool_addr: &[u8]) -> Option<(Vec<u8>, Vec<u8>)> {
    let calls = RpcCalls {
        calls: vec![
            RpcCall { to_addr: pool_addr.to_vec(), data: vec![0x0d, 0xfe, 0x16, 0x81] }, // token0()
            RpcCall { to_addr: pool_addr.to_vec(), data: vec![0xd2, 0x12, 0x20, 0xa7] }, // token1()
        ],
    };
    let responses = eth_call(&calls);
    if responses.responses.len() < 2 { return None; }
    let token0 = decode_address_return(&responses.responses[0].raw)?;
    let token1 = decode_address_return(&responses.responses[1].raw)?;
    Some((token0, token1))
}

Note: RpcBatch::add_call() does NOT exist in substreams-ethereum v0.11. Use eth_call(&RpcCalls { calls: [...] }) for raw calldata, or RpcBatch::add(AbiStruct {}, addr) when you have generated ABI structs.

Wire it into a store (exactly like ERC20 metadata):

map_v3_pools (emits pool addresses)  →  store_pool_tokens (set_if_not_exists)  →  map_v3_swaps (reads token0/token1 from store)

// In map_v3_pools: emit any new pool addresses seen this block
// In store handler:
#[substreams::handlers::store]
pub fn store_pool_tokens(pools: PoolAddresses, store: StoreSetIfNotExistsProto<TokenPair>) {
    for pool_hex in &pools.addresses {
        let pool_bytes = match hex::decode(pool_hex.trim_start_matches("0x")) {
            Ok(bytes) => bytes,
            Err(_) => { substreams::log::warn!("invalid pool address: {}", pool_hex); continue; }
        };
        if let Some((t0, t1)) = fetch_pool_tokens(&pool_bytes) {
            store.set_if_not_exists(0, pool_hex, &TokenPair {
                token0: format!("0x{}", hex::encode(&t0)),
                token1: format!("0x{}", hex::encode(&t1)),
            });
        }
    }
}

// In map_v3_swaps: read from store (zero RPC)
let pair = pool_store.get_last(&pool_hex)
    .unwrap_or_default();  // default = empty strings if pool not yet seen

Never use call traces for token resolution. block.calls() only has entries when a call to the pool was the top-level transaction call or an explicit internal call — it misses pools that emitted Swap via pure EVM event emission without a visible call trace.

Best Practices

Handle errors gracefully: Use Result<T, Error> returns
Log sparingly: Excessive logging impacts performance
Validate inputs: Check for null/empty data before processing
Use substreams helpers: Leverage substreams-ethereum crate
Test locally first: Always test with substreams run before deploying
Avoid excessive cloning: Use ownership transfer (see Performance section below)

Performance: Avoiding Excessive Cloning

CRITICAL: One of the greatest performance impacts in Substreams is excessive cloning of data structures.

The Problem

Cloning large data structures is expensive:

❌ Cloning a Transaction: Copies all fields, logs, traces
❌ Cloning a Block: Copies the entire block including all transactions (EXTREMELY expensive)
❌ Cloning in loops: Multiplies the cost by number of iterations

The Solution: Ownership Transfer

Use Rust's ownership system to transfer or borrow data instead of cloning.

Bad Example (Excessive Cloning)

#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
    let mut events = Events::default();

    for trx in block.transactions() {
        // ❌ BAD: Cloning entire transaction
        let transaction = trx.clone();

        for (log, _call) in transaction.logs_with_calls() {
            // ❌ BAD: Cloning log
            let log_copy = log.clone();
            if is_transfer_event(&log_copy) {
                events.transfers.push(extract_transfer(&log_copy));
            }
        }
    }

    Ok(events)
}

Good Example (Ownership Transfer)

#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
    let mut events = Events::default();

    // ✅ GOOD: Iterate by reference
    for trx in block.transactions() {
        // ✅ GOOD: Borrow, don't clone
        for (log, _call) in trx.logs_with_calls() {
            if is_transfer_event(log) {
                // ✅ GOOD: Only extract what you need
                events.transfers.push(extract_transfer(log));
            }
        }
    }

    Ok(events)
}

fn is_transfer_event(log: &Log) -> bool {
    // Use reference, no cloning
    !log.topics.is_empty() &&
    log.topics[0] == TRANSFER_EVENT_SIGNATURE
}

fn extract_transfer(log: &Log) -> Transfer {
    // Extract only the fields you need
    Transfer {
        from: Hex::encode(&log.topics[1]),
        to: Hex::encode(&log.topics[2]),
        amount: Hex::encode(&log.data),
        // Don't copy the entire log
    }
}

When Cloning is Acceptable

Clone only small, necessary data:

// ✅ OK: Cloning small strings
let token_address = Hex::encode(&log.address).clone();

// ✅ OK: Cloning primitive types
let block_number = block.number.clone();

// ❌ BAD: Cloning entire structures
let block_copy = block.clone(); // Never do this!
let trx_copy = transaction.clone(); // Avoid this!

Performance Tips

Use logs_with_calls(): Iterate logs without cloning

for (log, _call) in trx.logs_with_calls() { } // Good
for log in trx.receipt.as_ref().unwrap().logs.clone() { } // Bad

Use references when appropriate: Pass references to avoid unnecessary cloning

fn process_log(log: &Log) { } // Good for read-only access
fn process_log(log: Log) { } // Good when consuming/transforming data

Extract minimal data: Only copy what you actually need

// Good: Extract only needed fields
let amount = parse_amount(&log.data);

// Bad: Copy entire log just to get one field
let log_copy = log.clone();
let amount = parse_amount(&log_copy.data);

Use into() for consumption: When you need to consume data

// When you truly need to take ownership
events.transfers.push(Transfer {
    from: topics[1].into(), // Consumes the data
    to: topics[2].into(),
});

Common Pitfalls

Pitfall #1: Cloning in filters

// ❌ BAD
block.transactions()
    .iter()
    .filter(|trx| trx.clone().to == target) // Clone every transaction!

// ✅ GOOD
block.transactions()
    .iter()
    .filter(|trx| trx.to == target) // Just compare

Pitfall #2: Unnecessary defensive copies

// ❌ BAD
let block_copy = block.clone();
for trx in block_copy.transactions() { } // Why clone the whole block?

// ✅ GOOD
for trx in block.transactions() { } // Use the block directly

Pitfall #3: Cloning for mutation

// ❌ BAD
let mut trx_copy = trx.clone();
trx_copy.value = process(trx_copy.value); // Clone just to mutate

// ✅ GOOD
let new_value = process(&trx.value); // Process reference, create new value

Measuring Impact

Use substreams run with timing to measure performance:

# Before refactor (with clones)
time substreams run -s 17000000 -t +1000 map_events

# After refactor (clones removed) — re-run same command
time substreams run -s 17000000 -t +1000 map_events

# You should see significant speedup (2-10x) by avoiding clones

Remember

Measure performance impact: Use timing with substreams run to identify bottlenecks
Clone only when necessary: Most of the time, borrowing is sufficient
Block cloning is almost never needed: This is the #1 performance killer
Transaction cloning should be rare: Extract only the data you need

Common Patterns

See references/patterns.md for detailed examples:

Event extraction from logs
Store aggregation patterns
Multi-module composition
Parameterized modules
Dynamic data sources
Database sink patterns (delta updates, composite keys, sink SQL workflow)
Token metadata caching — always store, never HashMap; see "Token Metadata: ALWAYS Use a Store" above
Contract calls from maps — RpcBatch works in map handlers; see "Calling Contracts from a Map Handler" above

Querying Chain Head Block

To get the current head block of a chain (useful for determining the latest block number):

Using Substreams:

# Quick head block lookup for a network
substreams run common@latest -s -1 --network mainnet

# Or with explicit endpoint
substreams run common@latest -e=<network-id-alias-or-host> -s -1 -o jsonl

Read the first line of output to get the head block information. The -s -1 flag starts from the latest block.

Using firecore:

# JSON output (use jq for further processing if available)
firecore tools firehose-client <network-id-alias-or-host> -o json -- -1

# Text output (less detail), first line looks like:
# Block #24327807 (14b58bd3fa091c05a46d084bba1e78090d52556d29f4312da77b7aa3220423f4)
firecore tools firehose-client <network-id-alias-or-host> -o text -- -1

Read the first line of output to get the head block information.

Development Tips

Start small: Begin with 1000 block range for testing
Use GUI: substreams gui for visual debugging (when available)
Version control: Commit .spkg files for reproducibility
Document modules: Add doc: fields in manifest for clarity

Troubleshooting

Build fails:

version "x.y.z" should match Semver: Add a v prefix to package.version in substreams.yaml — use v0.1.0, not 0.1.0.
Check Rust toolchain: rustup target add wasm32-unknown-unknown
Ensure buf CLI is installed (required for proto generation)
Verify proto imports are correct
Add protobuf.excludePaths with sf/substreams and google when importing spkgs
Ensure binary path in manifest matches build output

Linking errors ("symbol multiply defined" or "failed to load bitcode"):

This typically indicates version mismatches between Substreams crates. Solutions:

Clean build: rm -rf target && substreams build
Verify all crate versions are compatible (see Cargo.toml Setup section)
Common incompatibility: substreams 0.6 + substreams-database-change 4 (requires substreams 0.7)

Missing method errors on ABI-generated types:

If you see errors like "no method named decode found":

Add use substreams_ethereum::Event; import
Ensure ethabi = "18" is in your Cargo.toml dependencies

spkg import 404 errors:

Use substreams-ethereum spkg, NOT sf-ethereum (doesn't exist)
Verify the release version exists on GitHub
Check for typos in the URL

Empty output:

Confirm initialBlock is before first relevant block
Check module isn't filtered out by upstream index
Verify input data exists in block range

Performance issues:

Add indexes to skip irrelevant blocks
Use --production-mode for large ranges

graph_out Modules (The Graph / Subgraph Output)

Also load substreams-sink skill when building a graph_out module. It contains the full working example and the EntityChanges proto definition.

Key facts to avoid the most common mistake:

You want to write to	Output proto	Package
The Graph / subgraph	`EntityChanges`	`sf.substreams.sink.entity.v1`
Postgres / SQL	`DatabaseChanges`	`sf.substreams.sink.database.v1`

These are NOT interchangeable. Using DatabaseChanges in a graph_out module (or vice versa) compiles but produces a pipeline that fails or emits garbage.

Quick pattern (full example in substreams-sink skill)

Do NOT add substreams-entity-change = "1" to Cargo.toml — v1 has a prost version conflict with the current toolchain (prost 0.13). Check crates.io to see if a newer version resolves this before inlining the proto. Instead, inline the proto:

proto/entity.proto (exact package name required):

syntax = "proto3";
package sf.substreams.sink.entity.v1;

message EntityChanges {
  repeated EntityChange entity_changes = 1;
}
message EntityChange {
  enum Operation { UNSET=0; CREATE=1; UPDATE=2; DELETE=3; FINAL=4; }
  string entity = 1;
  string id = 2;
  uint64 ordinal = 3;
  Operation operation = 4;
  repeated Field fields = 5;
}
message Value {
  oneof typed {
    int32  int32      = 1;
    string bigdecimal = 2;
    string bigint     = 3;
    string string     = 4;
    bytes  bytes      = 5;
    bool   bool       = 6;
    Array  array      = 10;
  }
}
message Array {
  repeated Value value = 1;
}
message Field {
  string name      = 1;
  Value  old_value = 2;
  Value  new_value = 3;
}

Wire compatibility: Copy this proto verbatim from the canonical source. The package name, message names, field numbers, and field types must all match exactly — simplifying any type (e.g. string for new_value) will produce empty/garbage values in Graph Node.

substreams.yaml output type:

output:
  type: proto:sf.substreams.sink.entity.v1.EntityChanges

Rust import (after proto is compiled via build.rs):

use crate::pb::sf::substreams::sink::entity::v1::{EntityChange, EntityChanges, Field};
use crate::pb::sf::substreams::sink::entity::v1::entity_change::Operation;

Solana Substreams

Solana uses a different block model, instruction paradigm, and account system than EVM chains. Do not apply Ethereum patterns here.

For all Solana development — block iteration, walk_instructions() vs message.instructions, SPL Token parsing, Anchor discriminators, b58!, Cargo.toml + manifest setup — see references/solana.md.

substreams-dev

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

substreams-dev

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Substreams Development Expert

Core Concepts

What is Substreams?

Key Components

Project Structure

Key Directories

build.rs for ABI Generation

Prerequisites

Required CLI Tools

Authentication

Common Workflows

Pre-flight: Clarifying Under-Specified Requests

Creating a New Project

README for substreams.dev Registry

References

Block & Transaction Filtering (Cost-Critical)

Debugging Checklist

Performance Optimization

Manifest Reference

Key Sections

Rust Module Development

Cargo.toml Setup

Map Handler Example

Store Handler Example

Token Metadata: ALWAYS Use a Store, NEVER a HashMap

Calling Contracts from a Map Handler (eth_call)

ABI codegen output types: BigInt, not ethabi::Uint

Generated .call() method takes one argument (the contract address)

Full module graph: cache once, read forever

Uniswap V3 Pool Token Resolution (F35)

Best Practices

Performance: Avoiding Excessive Cloning

The Problem

The Solution: Ownership Transfer

Bad Example (Excessive Cloning)

Good Example (Ownership Transfer)

When Cloning is Acceptable

Performance Tips

Common Pitfalls

Measuring Impact

Remember

Common Patterns

Querying Chain Head Block

Development Tips

Troubleshooting

graph_out Modules (The Graph / Subgraph Output)

Quick pattern (full example in substreams-sink skill)

Solana Substreams

Resources

Getting Help

Similar Skills

Substreams Development Expert

Core Concepts

What is Substreams?

Key Components

Project Structure

Key Directories

build.rs for ABI Generation

Prerequisites

Required CLI Tools

Authentication

Common Workflows

Pre-flight: Clarifying Under-Specified Requests

Creating a New Project

README for substreams.dev Registry

References

Block & Transaction Filtering (Cost-Critical)

Debugging Checklist

Generated `.call()` method takes one argument (the contract address)

Generated `.call()` method takes one argument (the contract address)