Overview
Wirekite performance is controlled by three main levers: thread counts, batch sizes, and compression. This guide covers how each parameter affects performance and provides recommendations for common scenarios.Thread Configuration
Threads control parallelism across all components. Each component runs its threads independently, so the total resource usage is the sum of all active components.Extraction Threads
Number of parallel threads for extracting tables from the source database. Each thread extracts one table at a time.
Number of parallel threads for the change extractor. Applies to CDC extraction from binlogs, WAL, or transaction logs.
Loading Threads
Number of parallel threads for loading data files into the target database. Each thread loads one file at a time.
Number of parallel threads for applying change files to the target database.
Mover Threads
Number of parallel threads for transferring files between source and target staging areas (S3, GCS, or Snowflake internal stage). Movers default to a higher thread count since file transfers are I/O-bound rather than CPU-bound.
Thread Recommendations
| Scenario | Extraction | Loading | Mover |
|---|---|---|---|
| Small database (under 10GB) | 4 | 4 | 4 |
| Medium database (10-100GB) | 8 | 6 | 8 |
| Large database (over 100GB) | 8-12 | 8 | 10 |
| Resource-constrained host | 2-4 | 2-4 | 4 |
Batch Sizes
Batch sizes control how much data is processed per unit of work. Larger batches improve throughput but use more memory.Data Extraction
Maximum rows per extracted data file. Tables larger than this threshold are split into multiple files. Applies to MySQL, Oracle, and SQL Server extractors.
PostgreSQL-specific alternative to
maxRowsPerDump. Measures in 8KB database pages rather than rows. The default of 8,000 pages produces files of approximately 64MB.| Row width | Recommended maxRowsPerDump | Approximate file size |
|---|---|---|
| Narrow (under 200 bytes) | 200,000-500,000 | 40-100MB |
| Medium (200-1000 bytes) | 100,000-200,000 | 50-200MB |
| Wide (over 1000 bytes) | 50,000-100,000 | 50-100MB |
Change Loading
Number of change files to process in a single merge operation. Higher values increase throughput but create larger transactions.
CDC Event Buffering
Number of change events buffered in memory before flushing to disk. Lower values reduce memory usage but increase I/O operations.
SQL Server CDC
Number of LSN (Log Sequence Number) records to process per batch. SQL Server specific.
Compression
Compress data files with gzip before transferring to cloud staging areas. Applies to AWS S3, GCS, and Snowflake internal stage movers.
| Condition | Recommendation |
|---|---|
| Network bandwidth is limited | Enable |
| Cloud egress costs are a concern | Enable |
| CPU is the bottleneck | Disable |
| Files are already small (under 10MB) | Disable |
| Large dataset with high compressibility (text-heavy) | Enable |
Database-Specific Tuning
MySQL
- The extractor uses
SQL_NO_CACHEto bypass the query cache and avoid polluting it during bulk reads - Use
loadLocal=trueif the MySQL server lacks the FILE privilege (usesLOAD DATA LOCAL INFILEinstead) - Binary data encoding (
hexEncodingorbase64Encoding) adds overhead; only enable if required by the target
PostgreSQL
- Uses the COPY protocol for high-speed bulk data movement
- Tune
maxPagesinstead ofmaxRowsPerDumpfor page-aligned extraction sortFiles=truesorts output by primary key after extraction, which can improve target load performance for ordered storage engines
Oracle
- CDC uses System Change Numbers (SCN) for precise position tracking
sortFiles=trueenables primary key ordering for extracted data- LOB (Large Object) columns are handled inline and can significantly increase row width
SQL Server
- The change extractor maintains a connection pool of 50 open connections and 25 idle connections
useExternalDumper=trueuses the BCP utility for faster extraction when availablelsnBatchSizecontrols the granularity of CDC processing; reduce if memory is constrained
Snowflake
- Loading performance depends on the Snowflake warehouse size specified in the connection string
- Uses atomic COPY operations with
ON_ERROR=ABORT_STATEMENTfor data integrity - Each loading thread performs a PUT followed by COPY; the warehouse compute scales with thread count
BigQuery
- Requires a GCS bucket for staging (
gcs_bucketparameter) - Uses a two-stage load process: CSV files are uploaded to GCS, then loaded via BigQuery load jobs
- Loading thread count should be conservative to avoid exceeding BigQuery quotas
Spanner
- Be conservative with
maxThreads(start at 4-5) as higher values can trigger Spanner rate limiting - Uses mutation batching for CDC operations
Pipeline Architecture
Understanding how the orchestrator coordinates components helps with tuning:- If extraction is fastest, the mover and loader work through a growing backlog
- If loading is slowest, files accumulate in the staging area until the loader catches up
- The
removeFilesparameter controls whether files are cleaned up after loading
For combined data + change mode, the orchestrator completes the bulk data load first, captures the source position, then starts change replication from that position. Thread counts are configured independently for each phase.
Tuning by Scenario
Maximize Throughput (Large Migration)
When migrating a large database and resources are plentiful:Minimize Resource Usage
When running on a constrained host or sharing resources:Optimize for Network-Limited Environments
When bandwidth between source and target is limited:CDC Replication Tuning
For low-latency continuous replication:Monitoring
During a migration, monitor these indicators to identify bottlenecks:| Indicator | What to Watch | Action if Bottleneck |
|---|---|---|
| Extraction rate | Files produced per minute in the output directory | Increase source.data.maxThreads |
| File backlog | Files waiting in the staging area | Increase mover.maxThreads or target.data.maxThreads |
| Load rate | Files consumed per minute by the loader | Increase target.data.maxThreads or reduce maxRowsPerDump |
| Database connections | Active connections on source/target | Reduce maxThreads if approaching limits |
| Memory usage | System memory on the Wirekite host | Reduce maxRowsPerDump, maxThreads, or maxFilesPerBatch |
| Disk space | Local file staging directory | Enable removeFiles=true or increase mover threads |
