> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wirekite.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Performance Tuning

> Optimize Wirekite migrations for throughput, memory, and resource efficiency.

## Overview

Wirekite performance is controlled by three main levers: **thread counts**, **batch sizes**, and **compression**. This guide covers how each parameter affects performance and provides recommendations for common scenarios.

## Thread Configuration

Threads control parallelism across all components. Each component runs its threads independently, so the total resource usage is the sum of all active components.

### Extraction Threads

<ResponseField name="source.data.maxThreads" type="integer" default="5">
  Number of parallel threads for extracting tables from the source database. Each thread extracts one table at a time.
</ResponseField>

<ResponseField name="source.change.maxThreads" type="integer" default="1">
  The change extractor is single-threaded to preserve transaction commit ordering. This parameter exists for future use but should be left at the default.
</ResponseField>

### Loading Threads

<ResponseField name="target.data.maxThreads" type="integer" default="5">
  Number of parallel threads for loading data files into the target database. Each thread loads one file at a time.
</ResponseField>

<ResponseField name="target.change.maxThreads" type="integer" default="5">
  Number of parallel threads for applying change files to the target database. For cloud targets (Snowflake, BigQuery, Firebolt, Databricks), the `maxMergeThreads` parameter (default: 2x CPU cores) controls parallelism within each merge batch.
</ResponseField>

### Mover Threads

<ResponseField name="mover.maxThreads" type="integer" default="10">
  Number of parallel threads for transferring files between source and target staging areas (S3, GCS, or Snowflake internal stage). Movers default to a higher thread count since file transfers are I/O-bound rather than CPU-bound.
</ResponseField>

### Thread Recommendations

| Scenario                    | Extraction | Loading | Mover |
| --------------------------- | ---------- | ------- | ----- |
| Small database (under 10GB) | 4          | 4       | 4     |
| Medium database (10-100GB)  | 8          | 6       | 8     |
| Large database (over 100GB) | 8-12       | 8       | 10    |
| Resource-constrained host   | 2-4        | 2-4     | 4     |

<Warning>
  Each extraction thread opens a database connection. Ensure your source database can handle the total connection count: `maxThreads` per active component. Monitor your database connection limits before increasing threads.
</Warning>

## Batch Sizes

Batch sizes control how much data is processed per unit of work. Larger batches improve throughput but use more memory.

### Data Extraction

<ResponseField name="source.data.maxRowsPerDump" type="integer" default="200000">
  Maximum rows per extracted data file. Tables larger than this threshold are split into multiple files. Applies to MySQL, Oracle, and SQL Server extractors.
</ResponseField>

<ResponseField name="source.data.maxPages" type="integer" default="8000">
  PostgreSQL-specific alternative to `maxRowsPerDump`. Measures in 8KB database pages rather than rows. The default of 8,000 pages produces files of approximately 64MB.
</ResponseField>

**Sizing guidance:**

| Row width                | Recommended maxRowsPerDump | Approximate file size |
| ------------------------ | -------------------------- | --------------------- |
| Narrow (under 200 bytes) | 200,000-500,000            | 40-100MB              |
| Medium (200-1000 bytes)  | 100,000-200,000            | 50-200MB              |
| Wide (over 1000 bytes)   | 50,000-100,000             | 50-100MB              |

### Change Loading

<ResponseField name="target.change.maxFilesPerBatch" type="integer" default="60">
  Number of change files to process in a single merge operation. Higher values increase throughput but create larger transactions. Default is 60 for most targets; 30 for Spanner and SingleStore.
</ResponseField>

### CDC Event Buffering

<ResponseField name="source.change.eventsPerFlush" type="integer" default="5000">
  Number of change events buffered in memory before flushing to disk. Applies to MySQL and Oracle change extractors only. PostgreSQL uses a fixed rotation threshold of approximately 150,000 operations. Lower values reduce memory usage but increase I/O operations.
</ResponseField>

### SQL Server CDC

<ResponseField name="source.change.lsnBatchSize" type="integer" default="1000000">
  Number of LSN (Log Sequence Number) records to process per batch. SQL Server specific.
</ResponseField>

## Compression

<ResponseField name="mover.gzipFiles" type="boolean" default="false">
  Compress data files with gzip before transferring to cloud staging areas. Applies to AWS S3, GCS, and Snowflake internal stage movers.
</ResponseField>

**When to enable compression:**

| Condition                                            | Recommendation |
| ---------------------------------------------------- | -------------- |
| Network bandwidth is limited                         | Enable         |
| Cloud egress costs are a concern                     | Enable         |
| CPU is the bottleneck                                | Disable        |
| Files are already small (under 10MB)                 | Disable        |
| Large dataset with high compressibility (text-heavy) | Enable         |

<Tip>
  Tabular data typically compresses 5-10x with gzip. A 200MB data file may compress to 20-40MB, significantly reducing transfer time on constrained networks.
</Tip>

## Database-Specific Tuning

### MySQL / MariaDB / SingleStore

* The extractor uses `SQL_NO_CACHE` to bypass the query cache and avoid polluting it during bulk reads
* Use `loadLocal=true` if the MySQL server lacks the FILE privilege (uses `LOAD DATA LOCAL INFILE` instead)
* Binary data encoding (`hexEncoding` or `base64Encoding`) adds overhead; only enable if required by the target

### PostgreSQL / YugaByte / TigerData / AlloyDB

* Uses the COPY protocol for high-speed bulk data movement
* Tune `maxPages` instead of `maxRowsPerDump` for page-aligned extraction
* `sortFiles=true` sorts output by primary key after extraction, which can improve target load performance for ordered storage engines

### Oracle

* CDC uses System Change Numbers (SCN) for precise position tracking
* `sortFiles=true` enables primary key ordering for extracted data
* LOB (Large Object) columns are handled inline and can significantly increase row width

### SQL Server

* The change extractor maintains a connection pool of 50 open connections and 25 idle connections
* `useExternalDumper=true` uses the BCP utility for faster extraction when available
* `lsnBatchSize` controls the granularity of CDC processing; reduce if memory is constrained

### Snowflake

* Loading performance depends on the Snowflake warehouse size specified in the connection string
* Uses atomic COPY operations with `ON_ERROR=ABORT_STATEMENT` for data integrity
* Each loading thread performs a PUT followed by COPY; the warehouse compute scales with thread count

### BigQuery

* Requires a GCS bucket for staging (`gcs_bucket` parameter)
* Uses a two-stage load process: CSV files are uploaded to GCS, then loaded via BigQuery load jobs
* Loading thread count should be conservative to avoid exceeding BigQuery quotas

### Spanner

* Be conservative with `maxThreads` (start at 4-5) as higher values can trigger Spanner rate limiting
* Uses mutation batching for CDC operations

## Pipeline Architecture

Understanding how the orchestrator coordinates components helps with tuning:

```
Source DB → [Extractor] → Local Files → [Mover] → Cloud Stage → [Loader] → Target DB
               ↑                           ↑                        ↑
          maxThreads=5              maxThreads=10              maxThreads=5
```

All three components run in parallel. The extractor produces files, the mover transfers them, and the loader consumes them. The pipeline naturally balances itself:

* If extraction is fastest, the mover and loader work through a growing backlog
* If loading is slowest, files accumulate in the staging area until the loader catches up
* The `removeFiles` parameter controls whether files are cleaned up after loading

<Note>
  For combined data + change mode, the orchestrator completes the bulk data load first, captures the source position, then starts change replication from that position. Thread counts are configured independently for each phase.
</Note>

## Tuning by Scenario

### Maximize Throughput (Large Migration)

When migrating a large database and resources are plentiful:

```ini theme={null}
# Aggressive extraction
source.data.maxThreads=12
source.data.maxRowsPerDump=200000

# Fast loading
target.data.maxThreads=8

# Fast file transfer
mover.maxThreads=10
mover.gzipFiles=false
```

### Minimize Resource Usage

When running on a constrained host or sharing resources:

```ini theme={null}
# Conservative extraction
source.data.maxThreads=2
source.data.maxRowsPerDump=50000

# Conservative loading
target.data.maxThreads=2

# Conservative transfer
mover.maxThreads=4
mover.gzipFiles=true
```

### Optimize for Network-Limited Environments

When bandwidth between source and target is limited:

```ini theme={null}
# Normal extraction
source.data.maxThreads=8
source.data.maxRowsPerDump=100000

# Normal loading
target.data.maxThreads=6

# Maximize transfer efficiency
mover.maxThreads=10
mover.gzipFiles=true
```

### CDC Replication Tuning

For low-latency continuous replication:

```ini theme={null}
# Change extraction (MySQL/Oracle)
source.change.eventsPerFlush=5000

# Change loading - smaller batches for lower latency
target.change.maxFilesPerBatch=10

# Fast change file transfer
mover.maxThreads=8
```

## Monitoring

During a migration, monitor these indicators to identify bottlenecks:

| Indicator            | What to Watch                                     | Action if Bottleneck                                         |
| -------------------- | ------------------------------------------------- | ------------------------------------------------------------ |
| Extraction rate      | Files produced per minute in the output directory | Increase `source.data.maxThreads`                            |
| File backlog         | Files waiting in the staging area                 | Increase `mover.maxThreads` or `target.data.maxThreads`      |
| Load rate            | Files consumed per minute by the loader           | Increase `target.data.maxThreads` or reduce `maxRowsPerDump` |
| Database connections | Active connections on source/target               | Reduce `maxThreads` if approaching limits                    |
| Memory usage         | System memory on the Wirekite host                | Reduce `maxRowsPerDump`, `maxThreads`, or `maxFilesPerBatch` |
| Disk space           | Local file staging directory                      | Enable `removeFiles=true` or increase mover threads          |

Use the [Web Interface](/run/execution/ux) to monitor real-time extraction and loading progress with per-table breakdown, or query the [wirekite\_progress](/run/operations) table directly.
