From research-factory
Data quality validation and completeness checks. Use when verifying processed datasets, checking merge quality, or validating sample construction results.
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-factory:data-validation Dataset to validate and expected propertiesDataset to validate and expected propertiesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- After any data processing or merge step
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(df.dtypes)
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(2)
print(missing_pct[missing_pct > 0].sort_values(ascending=False))
id_cols = ["firm_id", "date"] # adjust per dataset
dupes = df.duplicated(subset=id_cols).sum()
print(f"Duplicates on {id_cols}: {dupes}")
assert dupes == 0, f"FAIL: {dupes} duplicate rows on {id_cols}"
# Check key variables are in plausible range
for col in ["returns", "market_cap", "score"]:
print(f"{col}: min={df[col].min():.4f}, max={df[col].max():.4f}, mean={df[col].mean():.4f}")
obs_per_entity = df.groupby("firm_id").size()
print(f"Entities: {obs_per_entity.nunique()}")
print(f"Obs/entity: min={obs_per_entity.min()}, max={obs_per_entity.max()}, median={obs_per_entity.median()}")
print(f"Left only: {(merge_indicator == 'left_only').sum()}")
print(f"Right only: {(merge_indicator == 'right_only').sum()}")
print(f"Both: {(merge_indicator == 'both').sum()}")
match_rate = (merge_indicator == 'both').mean() * 100
print(f"Match rate: {match_rate:.1f}%")
=== Data Validation: {dataset_name} ===
Shape: (N, K)
ID columns: [firm_id, date] — 0 duplicates
Missing: col1 (2.3%), col2 (0.1%)
Key ranges: returns [-0.45, 0.82], market_cap [1.2M, 890B]
Panel: 3,456 firms, 2010-2023
VERDICT: PASS / FAIL (reason)
npx claudepluginhub xuxiguo/research-factory-claude --plugin research-factorySearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.