Create a MySQL CDC capture using flowctl with binlog replication. Use when setting up streaming from MySQL, Amazon RDS MySQL, or Aurora MySQL. Use when user says "capture MySQL", "stream from MySQL", "MySQL CDC", "binlog replication", or "connect MySQL to Estuary".
How this skill is triggered — by the user, by Claude, or both
Slash command
/estuary-materializations:capture-mysql-createThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Create a MySQL capture using flowctl to stream data from MySQL tables into Estuary collections using Change Data Capture (CDC) via binary log (binlog) replication.
Create a MySQL capture using flowctl to stream data from MySQL tables into Estuary collections using Change Data Capture (CDC) via binary log (binlog) replication.
Applies to: source-mysql, source-amazon-rds-mysql, source-amazon-aurora-mysql, source-google-cloud-sql-mysql, source-azure-mysql
Before proceeding, fetch the official connector docs for prerequisites, config reference, and cloud-specific setup.
Always load the main page: https://docs.estuary.dev/reference/Connectors/capture-connectors/MySQL/
Then load the variant subpage based on the user's MySQL type:
| Variant | Docs URL |
|---|---|
| Self-hosted MySQL | Main page covers this |
| Amazon Aurora MySQL | Main page covers this |
| Amazon RDS MySQL | https://docs.estuary.dev/reference/Connectors/capture-connectors/MySQL/amazon-rds-mysql/ |
| Google Cloud SQL MySQL | https://docs.estuary.dev/reference/Connectors/capture-connectors/MySQL/google-cloud-sql-mysql/ |
Use WebFetch to load these pages. Together they cover:
This skill provides the flowctl workflow and decision logic that docs don't cover.
Before writing any YAML, ask the user:
false) or full event history (true)America/New_York)Always use the latest numbered version tag. Query the connector registry to find it:
flowctl raw get --table connector_tags \
--query 'documentation_url=ilike.*source-mysql*' \
--query 'select=image_tag,documentation_url' \
--output yaml
Choose the connector image based on the user's MySQL variant:
| Variant | Connector Image |
|---|---|
| Self-hosted / Vanilla | ghcr.io/estuary/source-mysql |
| Amazon RDS MySQL | ghcr.io/estuary/source-amazon-rds-mysql |
| Amazon Aurora MySQL | ghcr.io/estuary/source-amazon-aurora-mysql |
| Google Cloud SQL MySQL | ghcr.io/estuary/source-google-cloud-sql-mysql |
| Azure Database for MySQL | ghcr.io/estuary/source-azure-mysql |
Walk the user through prerequisites from the docs loaded in Step 0:
SHOW VARIABLES LIKE 'binlog_format';SHOW VARIABLES LIKE 'binlog_row_image';For RDS: binlog retention is set via CALL mysql.rds_set_configuration('binlog retention hours', 72);
Build flow.yaml using the config reference from the docs. Minimal required config:
captures:
<tenant>/<path>/source-mysql:
endpoint:
connector:
image: ghcr.io/estuary/source-mysql:<version>
config:
address: "<host>:<port>"
user: "<username>"
password: "<password>"
historyMode: false
bindings: []
Important fields not in minimal config but commonly needed:
timezone: "America/New_York" — required if tables have DATETIME columnsadvanced.dbname: "your_app_db" — required if user can't access the mysql system databaseFor SSH tunnel, add networkTunnel.sshForwarding block — see docs for full config.
# Discover tables
flowctl discover --source flow.yaml
# Review the generated bindings
cat flow.yaml
# Publish the capture
flowctl catalog publish --source flow.yaml --auto-approve
# Check status (expect PENDING → BACKFILLING → OK: Streaming Binlog Events)
flowctl catalog status <tenant>/<path>/source-mysql
# View recent logs
flowctl logs --task <tenant>/<path>/source-mysql --since 5m | jq -c '{ts, message}'
# Read captured data
flowctl collections read --collection <tenant>/<path>/<schema>/<table> --uncommitted | head -10
Status progression:
PENDING — normal for ~30 seconds during shard assignmentBACKFILLING — initial table snapshotsOK: Streaming Binlog Events — CDC running normallyCause: Missing historyMode field in config
Fix: Add historyMode: false (or true for full event history).
Cause: Capture user can't access the mysql system database
Fix: Specify an alternative database:
config:
advanced:
dbname: "your_application_db"
Cause: binlog_format is not ROW
Fix: SET GLOBAL binlog_format = 'ROW'; — for RDS/Cloud SQL, update the parameter group/flags.
Cause: Binlog files purged; connector can't find its last position. Must re-backfill.
Prevention: Increase retention — SET GLOBAL binlog_expire_logs_seconds = 259200; (72 hours). For RDS: CALL mysql.rds_set_configuration('binlog retention hours', 72);
Cause: A single row/transaction exceeds MySQL's max_allowed_packet
Fix: SET GLOBAL max_allowed_packet = 1073741824; (1GB). For RDS: update via parameter group.
Cause: Tables have DATETIME columns but timezone not configured
Fix: Add timezone: "America/New_York" (or appropriate IANA timezone) to config.
Cause: User lacks replication permissions
Fix:
GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'flow_capture'@'%';
FLUSH PRIVILEGES;
Cause: Certain schema changes (beyond ADD/DROP COLUMN) stop the connector. DROP TABLE or TRUNCATE TABLE will also halt.
Fix: Check logs for the specific error. May need to remove the binding or re-create the capture.
Cause: Processing a very large transaction — the capture must process all changes before checkpointing.
Fix: Wait for completion. Check logs for progress. For future large operations, batch into smaller transactions.
Wait 30-60 seconds — this is normal during shard assignment. If still stuck:
flowctl logs --task <tenant>/<path>/source-mysql --since 5m | jq 'select(.level == "error" or .level == "warn")'
connector-disable-enable — Pause/restart existing capturesconnector-delete-recreate — Nuclear option for stuck capturesestuary-logs — Deep log analysisestuary-catalog-status — Status checkingnpx claudepluginhub estuary/agent-skills --plugin estuary-materializationsSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.