From clickhouse-pack
Designs ClickHouse schemas with MergeTree engines, ORDER BY sort keys, partitioning, and codecs for analytical workloads and query optimization.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clickhouse-pack:clickhouse-core-workflow-aThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design ClickHouse tables with correct engine selection, ORDER BY keys,
Design ClickHouse tables with correct engine selection, ORDER BY keys, partitioning, and codec choices for analytical workloads.
@clickhouse/client connected (see clickhouse-install-auth)| Engine | Best For | Dedup? | Example |
|---|---|---|---|
MergeTree | General analytics, append-only logs | No | Clickstream, IoT |
ReplacingMergeTree | Mutable rows (upserts) | Yes (on merge) | User profiles, state |
SummingMergeTree | Pre-aggregated counters | Sums numerics | Page view counts |
AggregatingMergeTree | Materialized view targets | Merges states | Dashboards |
CollapsingMergeTree | Stateful row updates | Collapses +-1 | Shopping carts |
ClickHouse Cloud uses SharedMergeTree — it is a drop-in replacement for
MergeTree on Cloud. You do not need to change your DDL.
The ORDER BY clause is the single most important schema decision. It defines:
Rules of thumb:
event_type, status)user_id, tenant_id)created_at)-- Good: filter by tenant, then by time ranges
ORDER BY (tenant_id, event_type, created_at)
-- Bad: UUID first means every query scans the full index
ORDER BY (event_id, created_at) -- event_id is random UUID
CREATE TABLE analytics.events (
event_id UUID DEFAULT generateUUIDv4(),
tenant_id UInt32,
event_type LowCardinality(String),
user_id UInt64,
session_id String,
properties String CODEC(ZSTD(3)), -- JSON blob, compress well
url String CODEC(ZSTD(1)),
ip_address IPv4,
country LowCardinality(FixedString(2)),
created_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, toDate(created_at), user_id)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 1 YEAR
SETTINGS index_granularity = 8192;
CREATE TABLE analytics.users (
user_id UInt64,
email String,
plan LowCardinality(String),
mrr_cents UInt32,
properties String CODEC(ZSTD(3)),
updated_at DateTime DEFAULT now()
)
ENGINE = ReplacingMergeTree(updated_at) -- keeps latest row per ORDER BY key
ORDER BY user_id;
-- Query with FINAL to get deduplicated results
SELECT * FROM analytics.users FINAL WHERE user_id = 42;
CREATE TABLE analytics.daily_stats (
date Date,
tenant_id UInt32,
event_type LowCardinality(String),
event_count UInt64,
unique_users AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, event_type, date);
| Partition Expression | Typical Use | Parts Per Partition |
|---|---|---|
toYYYYMM(date) | Most common — monthly | Target 10-1000 |
toMonday(date) | Weekly rollups | More parts, finer drops |
toYYYYMMDD(date) | Daily TTL drops | Many parts — use carefully |
| None | Small tables (<1M rows) | Fine |
Warning: Each partition creates separate parts on disk. Over-partitioning
(e.g., by user_id) creates millions of tiny parts and kills performance.
-- Column-level compression codecs
column1 UInt64 CODEC(Delta, ZSTD(3)), -- Time series / sequential IDs
column2 Float64 CODEC(Gorilla, ZSTD(1)), -- Floating point (similar values)
column3 String CODEC(ZSTD(3)), -- General text / JSON
column4 DateTime CODEC(DoubleDelta, ZSTD), -- Timestamps (near-sequential)
import { createClient } from '@clickhouse/client';
const client = createClient({ url: process.env.CLICKHOUSE_HOST! });
async function applySchema() {
await client.command({ query: 'CREATE DATABASE IF NOT EXISTS analytics' });
await client.command({
query: `
CREATE TABLE IF NOT EXISTS analytics.events (
event_id UUID DEFAULT generateUUIDv4(),
tenant_id UInt32,
event_type LowCardinality(String),
user_id UInt64,
payload String CODEC(ZSTD(3)),
created_at DateTime DEFAULT now()
)
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, created_at)
PARTITION BY toYYYYMM(created_at)
`,
});
console.log('Schema applied.');
}
| Error | Cause | Solution |
|---|---|---|
ORDER BY expression not in primary key | PRIMARY KEY != ORDER BY | Remove explicit PRIMARY KEY or align |
Too many parts (300+) | Over-partitioning | Use coarser partition expression |
Cannot convert String to UInt64 | Wrong data type | Match insert types to schema |
TTL expression type mismatch | TTL on non-date column | TTL must reference DateTime column |
For inserting and querying data, see clickhouse-core-workflow-b.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin clickhouse-packProvides 15+ schema design rules for ClickHouse tables including ORDER BY, PARTITION BY, data types, TTL, and engine selection for sub-second queries, 10x compression, and automated retention. Load when creating or modifying tables.
Creates ClickHouse MergeTree table for events, inserts JSONEachRow data via @clickhouse/client, and runs analytical GROUP BY queries. For new projects, MergeTree learning, or connection tests.
Reviews ClickHouse schemas, queries, and configurations using 31 prioritized rules across schema design, query optimization, data ingestion, and agent connectivity. Essential when working with ClickHouse databases.