Skip to main content

Overview

Wirekite supports MySQL 5.x and above (including MariaDB and SingleStore) as a source database for:
  • Schema Extraction - Extract table definitions to Wirekite’s intermediate format
  • Data Extraction - Bulk extract table data for initial load
  • Change Extraction (CDC) - Capture ongoing inserts, updates, and deletes via binlog
All references to MySQL in this guide also apply to MariaDB and SingleStore setups. SingleStore uses the MySQL wire protocol, so the configuration and parameters are identical. For SingleStore-specific target details (DSN and version differences), see the SingleStore Notes in the target guide.

Prerequisites

Before configuring MySQL as a Wirekite source, ensure the following requirements are met:

Database Configuration

  1. Version: MySQL 5.7 or later (MySQL 8.0+ recommended)
  2. Storage Engine: Tables must use InnoDB
  3. Binary Logging (CDC only): Must be enabled with ROW format in my.cnf:
    [mysqld]
    log_bin = mysql-bin
    binlog_format = ROW
    binlog_row_image = FULL
    server_id = 1
    
    Restart MySQL after changing these settings.
  4. Group Concat Length: The schema extractor uses GROUP_CONCAT() internally. Set this variable to avoid silently truncated results:
    SET GLOBAL group_concat_max_len = 50000;
    

File System Access

For server-side extraction (recommended for performance):
  • Configure secure_file_priv to allow MySQL to write to the output directory
  • Ensure the MySQL process has write permissions to the output directory
For cloud-managed MySQL instances (RDS, Cloud SQL, etc.) or remote databases where server-side file access isn’t available, use databaseRemote=true (the default) to extract data via the client connection.

Replication Considerations

When performing initial data extraction, stop replication on the source database to ensure the dump is consistent to a specific binlog position. It is highly recommended to run Wirekite against a replica rather than the production primary to avoid impacting production workloads.

Source Database Setup

Wirekite requires a dedicated user, a wirekite database for internal tracking tables, and specific privileges on the application database. Follow these steps to configure your MySQL source.

Step 1: Create the Wirekite Database

Wirekite uses a dedicated wirekite database for its internal tracking tables (wirekite_progress and wirekite_action). These tables track extraction progress and enable crash recovery.
CREATE DATABASE IF NOT EXISTS wirekite;

Step 2: Create the Wirekite User

CREATE USER 'wirekite'@'%' IDENTIFIED BY '<your_password>';
If wirekite will only connect from a specific host:
CREATE USER 'wirekite'@'<wirekite_ip>' IDENTIFIED BY '<your_password>';

Step 3: Grant Privileges on the Wirekite Database

Wirekite needs full control of its own internal database to create and manage tracking tables:
GRANT CREATE, DROP, INDEX, INSERT, SELECT, UPDATE ON wirekite.* TO 'wirekite'@'%';

Step 4: Grant Read Access to Application Tables

For each application database that contains tables you want to extract:
GRANT SELECT ON <app_database>.* TO 'wirekite'@'%';
To grant access to specific tables only:
GRANT SELECT ON <app_database>.table1 TO 'wirekite'@'%';
GRANT SELECT ON <app_database>.table2 TO 'wirekite'@'%';
-- repeat for each table

Step 5 (Optional): Server-Side File Export

If you plan to use server-side file export (the “local” mode where the database writes files directly using SELECT ... INTO OUTFILE), the wirekite user needs the global FILE privilege:
GRANT FILE ON *.* TO 'wirekite'@'%';
This is NOT needed if Wirekite and the database are on different machines (the default “remote” streaming mode with databaseRemote=true).

Step 6 (CDC Only): Grant Replication Privileges

For change data capture (CDC), Wirekite uses MySQL’s native binary log replication protocol. The wirekite user needs replication privileges:
GRANT REPLICATION SLAVE ON *.* TO 'wirekite'@'%';
GRANT REPLICATION CLIENT ON *.* TO 'wirekite'@'%';
REPLICATION SLAVE allows Wirekite to read the binary log stream. REPLICATION CLIENT allows Wirekite to run SHOW BINARY LOGS and SHOW MASTER STATUS.

Step 7 (CDC + Replica Handover Only): Additional Privileges

If you are extracting from a MySQL replica and need handover operations (stop/start replication on the replica): MySQL 8.0+:
GRANT RELOAD ON *.* TO 'wirekite'@'%';
GRANT REPLICATION_SLAVE_ADMIN ON *.* TO 'wirekite'@'%';
MySQL 5.7:
GRANT RELOAD ON *.* TO 'wirekite'@'%';
GRANT SUPER ON *.* TO 'wirekite'@'%';
RELOAD is needed for FLUSH BINARY LOGS. REPLICATION_SLAVE_ADMIN (or SUPER on 5.7) is needed for STOP REPLICA / START REPLICA.

Step 8: Apply Privileges

FLUSH PRIVILEGES;

Complete Setup Script

Data Extraction Only

-- Run as root or admin user

-- 1. Create wirekite database
CREATE DATABASE IF NOT EXISTS wirekite;

-- 2. Create user
CREATE USER 'wirekite'@'%' IDENTIFIED BY '<your_password>';

-- 3. Grant wirekite database privileges
GRANT CREATE, DROP, INDEX, INSERT, SELECT, UPDATE ON wirekite.* TO 'wirekite'@'%';

-- 4. Grant read access to application tables
GRANT SELECT ON <app_database>.* TO 'wirekite'@'%';

FLUSH PRIVILEGES;

Data Extraction + CDC (Change Data Capture)

-- Run as root or admin user

-- 1. Create wirekite database
CREATE DATABASE IF NOT EXISTS wirekite;

-- 2. Create user
CREATE USER 'wirekite'@'%' IDENTIFIED BY '<your_password>';

-- 3. Grant wirekite database privileges
GRANT CREATE, DROP, INDEX, INSERT, SELECT, UPDATE ON wirekite.* TO 'wirekite'@'%';

-- 4. Grant read access to application tables
GRANT SELECT ON <app_database>.* TO 'wirekite'@'%';

-- 5. Grant replication privileges for CDC
GRANT REPLICATION SLAVE ON *.* TO 'wirekite'@'%';
GRANT REPLICATION CLIENT ON *.* TO 'wirekite'@'%';

FLUSH PRIVILEGES;

-- 6. Verify binary logging
SHOW VARIABLES LIKE 'log_bin';           -- must be ON
SHOW VARIABLES LIKE 'binlog_format';     -- must be ROW
SHOW VARIABLES LIKE 'binlog_row_image';  -- should be FULL

Verify the Setup

Connect as the wirekite user and verify access:
-- Should succeed
SELECT 1 FROM <app_database>.table1 LIMIT 1;

-- Should succeed
SELECT * FROM information_schema.columns
WHERE table_schema = '<app_database>' AND table_name = 'table1'
LIMIT 1;

-- Should succeed (CDC only)
SHOW BINARY LOGS;

What Wirekite Creates

ObjectTypeDatabasePurpose
wirekite_progressTablewirekiteTracks extraction progress, crash recovery
wirekite_actionTablewirekiteRun/pause/stop control
wkp_finish_timeIndexwirekiteIndex on wirekite_progress
wkp_start_timeIndexwirekiteIndex on wirekite_progress
Wirekite does NOT create any triggers, functions, or stored procedures on the source database. CDC is purely binary-log-based.

Privilege Summary

PrivilegeScopeRequired For
CREATE, DROP, INDEX, INSERT, SELECT, UPDATEwirekite.*Internal tracking tables
SELECT<app_database>.*Reading application table data
FILE*.* (global)Optional: server-side file export only
REPLICATION SLAVE*.* (global)CDC only: binlog replication
REPLICATION CLIENT*.* (global)CDC only: binlog position queries
RELOAD*.* (global)CDC + replica handover only
REPLICATION_SLAVE_ADMIN*.* (global)CDC + replica handover only (MySQL 8.0+)

Schema Extractor

The Schema Extractor reads table definitions from MySQL and outputs them to Wirekite’s intermediate schema format (.skt file). This schema file is used by target loaders to create corresponding tables in the destination database.
The Schema Extractor can run standalone or as part of the Orchestrator workflow. The configuration parameters are the same in both cases.

Required Parameters

dsnFile
string
required
Path to a file containing the MySQL connection string. The file should contain exactly one line with the DSN.
Connection string format:
username:password@tcp(host:port)/database?multiStatements=true&loc=Local
Example DSN file contents:
wirekite:secretpass@tcp(mysql-replica.example.com:3306)/myapp?multiStatements=true&loc=Local
tablesFile
string
required
Path to a file listing the tables to extract, one per line in schema.table format.
Example tablesFile contents:
myapp.users
myapp.orders
myapp.products
inventory.items
outputDirectory
string
required
Absolute path to the directory where Wirekite will write the schema file (wirekite_schema.skt). The directory must exist and be writable.
logFile
string
required
Absolute path to the log file where the Schema Extractor will write operational logs.

Data Extractor

The Data Extractor performs bulk extraction of table data, writing records to Wirekite’s intermediate data format (.dkt files). Large tables are automatically split across multiple files based on the maxRowsPerDump setting.

Required Parameters

dsnFile
string
required
Path to a file containing the MySQL connection string.
Connection string format:
username:password@tcp(host:port)/database?multiStatements=true&loc=Local
tablesFile
string
required
Path to a file listing the tables to extract, one per line in schema.table format.
outputDirectory
string
required
Absolute path to the directory where Wirekite will write data files. Files are named schema.table.N.dkt where N is a sequence number.
logFile
string
required
Absolute path to the log file for Data Extractor operations.

Optional Parameters

maxThreads
integer
default:"5"
Number of parallel extraction threads. Each thread consumes one database connection and approximately one CPU core. Set this based on available CPU cores and acceptable database load.
maxRowsPerDump
integer
default:"200000"
Maximum number of rows written to each output file. Large tables are split into multiple files. Adjust based on available memory and disk space. Smaller values create more files but use less memory.
hexEncoding
boolean
default:"false"
When true, binary and string data is encoded as hexadecimal instead of base64. Hex encoding produces larger files but may be required for certain target databases. Must match the setting used by the target loader.
databaseRemote
boolean
default:"true"
When true, uses client-side data extraction instead of MySQL’s SELECT INTO OUTFILE. Required for cloud-managed MySQL instances (Amazon RDS, Google Cloud SQL, Azure Database for MySQL) where server-side file operations are not available. Set to false for local extraction using server-side file operations.
For best performance on dedicated extraction hosts, set maxThreads equal to the number of CPU cores available.

Change Extractor

The Change Extractor captures ongoing data changes (INSERT, UPDATE, DELETE) by reading MySQL binary logs. It outputs change events to Wirekite’s change format (.ckt files) for loading into target databases.
The Change Extractor requires binary logging to be enabled with binlog_format = ROW. It connects to MySQL as a replication client to stream binlog events.

Required Parameters

dsnFile
string
required
Path to a file containing the MySQL connection string.
Connection string format:
username:password@tcp(host:port)/database?multiStatements=true&loc=Local
tablesFile
string
required
Path to a file listing the tables to track for changes, one per line in schema.table format.
outputDirectory
string
required
Absolute path to the directory where Wirekite will write change files. Files are named with sequential numbers (e.g., 0.ckt, 1.ckt, etc.).
logFile
string
required
Absolute path to the log file for Change Extractor operations.

Position Parameters

binlogFile
string
The MySQL binary log filename to start reading from (e.g., mysql-bin.000042). When running via the Orchestrator in data+change mode, this is automatically captured after data extraction completes. Required for change-only mode.
binlogPosition
integer
default:"0"
The position within the binlog file to start reading from. Combined with binlogFile, this defines the exact starting point for change capture. When running via the Orchestrator in data+change mode, this is automatically captured.
In change-only mode (without data extraction), you must specify both binlogFile and binlogPosition to define where change capture should begin. Using incorrect values may result in missing changes or duplicate data.

Optional Parameters

verbose
boolean
default:"false"
When true, enables detailed logging including metadata files that can help troubleshoot issues.
processOneBinlog
boolean
default:"false"
When true, processes only one binary log file and then exits. Useful for testing or manual incremental processing.
processCommits
boolean
default:"false"
When true, includes BEGIN and COMMIT markers in the output. Wirekite guarantees that transactions never span multiple output files regardless of this setting.
eventsPerFlush
integer
default:"5000"
Number of change events to buffer before flushing to disk. Lower values reduce memory usage but increase I/O operations. The buffer is always flushed at transaction boundaries.
exitWhenIdle
boolean
default:"false"
When true, the extractor exits after processing all available binlog events. When false (default), it runs continuously, waiting for new changes indefinitely.
idleWaitSeconds
integer
default:"2"
Number of seconds to wait for new binlog events before exiting. Only applies when exitWhenIdle is true. Minimum value is 2.
flushIntervalSeconds
integer
default:"5"
Interval in seconds for periodic flushing of accumulated changes to disk. Ensures data is written even during low-activity periods. Only applies when exitWhenIdle is false.

Orchestrator Configuration

When using the Wirekite Orchestrator, prefix source parameters with source.schema., source.data., or source.change. depending on the operation mode. Example orchestrator configuration for MySQL source:
# Main configuration
source=mysql
target=snowflake

# Schema extraction
source.schema.dsnFile=/opt/wirekite/config/mysql.dsn
source.schema.tablesFile=/opt/wirekite/config/tables.txt
source.schema.outputDirectory=/opt/wirekite/output/schema
source.schema.logFile=/var/log/wirekite/schema-extractor.log

# Data extraction
source.data.dsnFile=/opt/wirekite/config/mysql.dsn
source.data.tablesFile=/opt/wirekite/config/tables.txt
source.data.outputDirectory=/opt/wirekite/output/data
source.data.logFile=/var/log/wirekite/data-extractor.log
source.data.maxThreads=8
source.data.maxRowsPerDump=100000
source.data.databaseRemote=false

# Change extraction (CDC)
source.change.dsnFile=/opt/wirekite/config/mysql.dsn
source.change.tablesFile=/opt/wirekite/config/tables.txt
source.change.outputDirectory=/opt/wirekite/output/changes
source.change.logFile=/var/log/wirekite/change-extractor.log
source.change.eventsPerFlush=10000
source.change.exitWhenIdle=false
When running in data+change mode, the Orchestrator automatically captures the binlog position after data extraction and passes it to the Change Extractor. You do not need to manually specify binlogFile and binlogPosition in this mode.
For complete Orchestrator documentation, see the Execution Guide.