From harness-claude
Designs append-heavy tables for metrics, events, and logs with time-based partitioning, retention policies, and efficient aggregation using PostgreSQL.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness-claude:db-time-seriesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Designing append-heavy tables for metrics, events, and logs with time-based partitioning, retention policies, and efficient aggregation.
Designing append-heavy tables for metrics, events, and logs with time-based partitioning, retention policies, and efficient aggregation.
Time-series workloads share four characteristics:
Schema design with partitioning:
CREATE TABLE metrics (
time timestamptz NOT NULL,
device_id int NOT NULL,
temperature float,
humidity float
) PARTITION BY RANGE (time);
-- Create monthly partitions
CREATE TABLE metrics_2024_01 PARTITION OF metrics
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE metrics_2024_02 PARTITION OF metrics
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Key design choices:
(device_id, time) for per-device queries.DROP TABLE metrics_2024_01; is instant. DELETE WHERE time < X creates dead tuples requiring vacuum -- avoid this.IoT sensor monitoring platform:
-- BRIN index on time (100x smaller than B-tree for sequential inserts)
CREATE INDEX idx_metrics_time_brin ON metrics USING brin (time);
-- Composite B-tree for per-device queries
CREATE INDEX idx_metrics_device_time ON metrics (device_id, time);
Time-range query with partition pruning:
EXPLAIN ANALYZE
SELECT device_id, avg(temperature), max(humidity)
FROM metrics
WHERE time BETWEEN '2024-03-01' AND '2024-03-31'
GROUP BY device_id;
The query plan shows Partitions removed: 10 -- only the March partition is scanned. Without partitioning, the entire table would be read.
Retention by dropping old partitions:
-- Instant operation, no dead tuples, no vacuum needed
DROP TABLE metrics_2023_01;
Compare with DELETE-based retention: DELETE FROM metrics WHERE time < '2023-02-01' on a 500M-row table generates millions of dead tuples, triggers autovacuum storms, and takes hours.
Continuous aggregation for dashboards:
CREATE MATERIALIZED VIEW hourly_metrics AS
SELECT
date_trunc('hour', time) AS hour,
device_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
count(*) AS readings
FROM metrics
GROUP BY 1, 2;
CREATE UNIQUE INDEX ON hourly_metrics (hour, device_id);
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY hourly_metrics;
Declarative partitioning (PARTITION BY RANGE) is the foundation. PostgreSQL 11+ supports partition pruning at plan time and execution time.
BRIN indexes are ideal for time columns because data is physically ordered by insertion time:
CREATE INDEX ON metrics USING brin (time) WITH (pages_per_range = 32);
The pages_per_range parameter controls granularity -- smaller values give tighter range summaries at the cost of a larger index.
pg_partman extension automates partition creation and retention:
SELECT partman.create_parent(
'public.metrics', 'time', 'native', 'monthly'
);
-- Automatically creates future partitions and drops old ones
Materialized views with REFRESH CONCURRENTLY for zero-downtime dashboard aggregation. Requires a unique index on the materialized view.
TimescaleDB hypertables automatically chunk data by time, eliminating manual partition management:
SELECT create_hypertable('metrics', 'time');
TimescaleDB adds compression (columnar, 90%+ size reduction on old chunks), continuous aggregates (real-time aggregation combining materialized and recent data), and time_bucket() for flexible time grouping.
Two-step aggregation: Pre-aggregate in a materialized view (hourly), then re-aggregate at query time (daily, weekly). This reduces the data scanned for dashboard queries by 100-1000x.
Write-ahead log tuning for high-throughput ingestion:
synchronous_commit = off for metrics where losing a few seconds of data on crash is acceptablewal_level = minimal if replication is not neededCOPY instead of individual INSERTs (10-50x throughput improvement)MySQL 8.0 supports PARTITION BY RANGE but with key differences:
CREATE EVENT) or application-level refreshFor MySQL, the partition management pattern:
ALTER TABLE metrics DROP PARTITION p_2023_01; -- instant retention
ALTER TABLE metrics ADD PARTITION (
PARTITION p_2024_04 VALUES LESS THAN ('2024-05-01')
);
Fleet management platform ingesting 500K sensor readings/minute from 100K vehicles. Partitioned by day (30 partitions retained), BRIN index on time, composite B-tree on (vehicle_id, time). Write throughput: 500K rows/minute sustained using batched COPY. Query latency: "last 24h for vehicle X" in 4ms. Retention: daily cron drops partitions older than 30 days -- instant, no vacuum. Storage: 2TB raw data per month, reduced to 200GB with TimescaleDB compression on chunks older than 7 days.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeGuides partitioning large PostgreSQL tables by range, list, or hash to improve query performance, simplify maintenance, and enable efficient data lifecycle management.
Guides selection, operations, and best practices for 14 time-series databases like InfluxDB, Prometheus, TimescaleDB for metrics, IoT, financial, observability storage design.
Provides 15+ schema design rules for ClickHouse tables including ORDER BY, PARTITION BY, data types, TTL, and engine selection for sub-second queries, 10x compression, and automated retention. Load when creating or modifying tables.