Skill

querying-clickhouse

Query patterns, safety rules, and performance tips for ClickHouse investigation queries against osprey_execution_results. Use when writing or reviewing ClickHouse queries for investigations.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/skywatch-investigations:querying-clickhouse

Not user invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill provides essential knowledge for writing safe, efficient queries against Osprey rule execution data. Use this when you need to investigate account behavior, analyze rule performance, or detect patterns.

Supporting Files

references/common-queries.md

SKILL.md

388 lines · ~3.7k tokens

Stats

LanguageJavaScript

Parent stars3

MaintenanceExcellent

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Querying ClickHouse

Safety Rules

The ClickHouse MCP server enforces strict safety constraints on all queries:

SELECT Only

All queries must be SELECT statements. No INSERT, UPDATE, DELETE, or DDL operations are permitted.

LIMIT Required

Every query must include a LIMIT clause. This prevents accidental runaway queries and caps result set sizes.

-- Good
SELECT ... FROM default.osprey_execution_results WHERE ... LIMIT 100

-- Bad (will be rejected)
SELECT ... FROM default.osprey_execution_results WHERE ...

Read-Only Enforcement

Queries must start with SELECT or WITH (for CTEs). JOINs, UNIONs, subqueries, and any table are allowed. Semicolons and INTO are blocked to prevent multi-statement execution and data export.

Key investigation tables:

default.osprey_execution_results — Osprey rule execution history
default.pds_signup_anomalies — PDS signup rate anomalies
default.url_overdispersion_results — Coordinated domain/URL sharing anomalies
default.account_entropy_results — Bot-like posting pattern detection
default.url_cosharing_pairs — Daily account URL co-sharing pairs (TTL 7 days)
default.url_cosharing_clusters — URL co-sharing cluster metrics and evolution (no TTL)
default.url_cosharing_membership — Daily URL co-sharing cluster membership (TTL 7 days)
default.quote_cosharing_pairs — Daily account quote co-sharing pairs (TTL 7 days)
default.quote_cosharing_clusters — Quote co-sharing cluster metrics and evolution (no TTL)
default.quote_cosharing_membership — Daily quote co-sharing cluster membership (TTL 7 days)
default.quote_overdispersion_results — Coordinated quote-post anomalies

-- All valid
SELECT * FROM default.osprey_execution_results WHERE ... LIMIT 100
SELECT a.did, b.cluster_id FROM default.osprey_execution_results a
  JOIN default.url_cosharing_membership b ON a.did = b.did LIMIT 50
WITH flagged AS (SELECT did FROM default.account_entropy_results WHERE is_bot_like = 1)
  SELECT * FROM default.osprey_execution_results WHERE did IN (SELECT did FROM flagged) LIMIT 100

-- Rejected
INSERT INTO ...
SELECT * FROM ... INTO OUTFILE ...
SELECT * FROM ... ; DROP TABLE ...

60-Second Timeout

Queries running longer than 60 seconds are automatically cancelled. This encourages efficient query design and prevents resource exhaustion.

Constraint Enforcement

These constraints are enforced at the MCP layer before queries reach ClickHouse, so policy violations are caught early.

Query Structure Best Practices

Follow this pattern for reliable, performant queries:

1. Filter by Time Range First

ClickHouse tables are time-partitioned. Always include a created_at filter to dramatically improve performance.

SELECT rule_name, count() as hits
FROM default.osprey_execution_results
WHERE created_at > now() - interval 7 day
GROUP BY rule_name
LIMIT 100

Without time filtering, queries may scan the entire table and timeout.

2. Select Specific Columns

Avoid SELECT *. ClickHouse is column-oriented, so selecting only needed columns significantly improves query speed.

-- Good (fast)
SELECT did, handle, rule_name, created_at
FROM default.osprey_execution_results
WHERE created_at > now() - interval 1 day
LIMIT 100

-- Bad (slow)
SELECT *
FROM default.osprey_execution_results
WHERE created_at > now() - interval 1 day
LIMIT 100

3. Use LIMIT Generously

Start with conservative LIMIT values (10-100) for exploratory queries, and increase only if needed.

-- Safe for exploration
SELECT ...
LIMIT 50

-- For comprehensive analysis, still cap results
SELECT ...
LIMIT 10000

4. Filter Indexed Columns When Possible

The following columns are indexed and filter efficiently:

created_at — Timestamp (most important)
did — Account DID
handle — Account handle
rule_name — Rule name

Use these in WHERE clauses whenever possible.

SELECT did, handle, rule_name, score, created_at FROM default.osprey_execution_results
WHERE rule_name = 'spam-bot-pattern'
  AND created_at > now() - interval 1 day
LIMIT 100

Performance Tips

Content Search Is Expensive

The ngramDistance() function searches for similar text by n-gram comparison. It's powerful but slow. Note: ngramDistance() returns 0 for identical content and 1 for completely different content.

Always pair ngramDistance() with other filters:

-- Good: narrow context with time + ngramDistance
SELECT did, handle, content
FROM default.osprey_execution_results
WHERE created_at > now() - interval 1 day
  AND ngramDistance(content, 'target phrase') < 0.5
ORDER BY ngramDistance(content, 'target phrase') ASC
LIMIT 50

-- Bad: ngramDistance alone scans all content
SELECT ...
WHERE ngramDistance(content, 'target phrase') < 0.5
LIMIT 100

Aggregate Queries Are Fast

GROUP BY queries are typically faster than raw row selection, because aggregation reduces result set size.

-- Fast: aggregates reduce data volume
SELECT rule_name, count() as hits, avg(score) as avg_score
FROM default.osprey_execution_results
WHERE created_at > now() - interval 7 day
GROUP BY rule_name
LIMIT 50

-- Slower: full row enumeration
SELECT rule_name, score
FROM default.osprey_execution_results
WHERE created_at > now() - interval 7 day
LIMIT 1000

Avoid Expensive Operations

String concatenation in WHERE clauses
Function calls on large datasets (e.g., LOWER(content) for every row)
Subqueries (use JOIN patterns instead if possible, though joins are limited to the same table)