From geoparquet
This skill should be used when working with GeoParquet files - the cloud-native format for geospatial vector data. Covers best practices for creating, optimizing, and distributing GeoParquet using gpio CLI and DuckDB.
How this skill is triggered — by the user, by Claude, or both
Slash command
/geoparquet:geoparquetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guide users through GeoParquet workflows: creating, optimizing, validating, and distributing GeoParquet files following official best practices.
Guide users through GeoParquet workflows: creating, optimizing, validating, and distributing GeoParquet files following official best practices.
GeoParquet is Apache Parquet with standardized geospatial metadata. It combines Parquet's columnar efficiency with proper geometry encoding (WKB), coordinate reference system metadata, and bounding box optimizations.
Key advantages over legacy formats:
Always prefer gpio for GeoParquet operations. It implements all best practices by default.
Installation:
pipx install --pre geoparquet-io # Isolated (recommended)
pip install --pre geoparquet-io # Or with pip
uv pip install --pre geoparquet-io # Or with uv
Note: Use --pre for latest beta releases (1.0 not yet released).
If gpio is not installed, guide the user through installation before proceeding.
Use DuckDB for complex SQL, joins, aggregations, or geometry operations. Requires DuckDB 1.5+ for projection/CRS.
pip install "duckdb>=1.5"
When using DuckDB, apply best practices manually:
ORDER BY ST_Hilbert(geometry) for spatial sortingCOMPRESSION ZSTD with COMPRESSION_LEVEL 15ROW_GROUP_SIZE 100000gpio check allFor detailed tool comparison, see references/tool-comparison.md.
When a user provides spatial data:
gpio inspect <file>
gpio inspect stats <file>
Report: row count, geometry type, CRS, columns, file size.
# Standard (fast, for development)
gpio convert geoparquet <input> <output>
# For distribution (higher compression)
gpio convert geoparquet <input> <output> --compression-level 15
gpio check all <output>
# Fix issues if found
gpio check all <output> --fix --output <fixed>
Small (<100MB, <100k rows):
Medium (100MB-2GB, 100k-10M rows):
Large (>2GB, >10M rows):
gpio publish stac <file> <file.stac.json>
gpio publish upload <file> s3://bucket/path/
# Inspect
gpio inspect <file>
gpio inspect stats <file>
# Convert
gpio convert geoparquet <input> <output>
gpio convert geoparquet <input> <output> --compression-level 15
# Validate
gpio check all <file>
gpio check all <file> --fix --output <fixed>
# Extract subset (works with remote files)
gpio extract <input> <output> --bbox "minx,miny,maxx,maxy"
gpio extract <input> <output> --where "column > value"
# Extract from services
gpio extract bigquery project.dataset.table output.parquet
gpio extract arcgis https://...FeatureServer/0 output.parquet
# Add spatial indices
gpio add bbox <input> <output>
gpio add h3 <input> <output> --resolution 9
gpio add admin-divisions <input> <output> --dataset gaul
# Partition large files
gpio partition kdtree <input> <output_dir> --max-rows-per-file 500000
# Publish
gpio publish stac <input> <output.json>
gpio publish upload <file> s3://bucket/path/
For complete command reference, see references/gpio-commands.md.
Before publishing GeoParquet files:
gpio check allFor detailed best practices, see references/distribution-best-practices.md.
| Task | Command |
|---|---|
| Convert to GeoParquet | gpio convert geoparquet <input> <output> |
| Extract subset by bbox | gpio extract <input> <output> --bbox "..." |
| Extract from BigQuery | gpio extract bigquery <table> <output> |
| Extract from ArcGIS | gpio extract arcgis <url> <output> |
| Add spatial index | `gpio add h3 |
| Add admin boundaries | gpio add admin-divisions <input> <output> |
| Validate file | gpio check all <file> |
| Partition large file | gpio partition kdtree <input> <dir> |
| Generate STAC | gpio publish stac <input> <output> |
| Upload to S3 | gpio publish upload <file> s3://... |
| Convert to PMTiles | gpio pmtiles create <input> <output> |
--verbose for detailed output--dry-run to preview operations--json for machine-readable outputgpio check all before publishingSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub geoparquet-io/geoparquet-skill --plugin geoparquet