Data Loading

The Pipeline

Wirekite moves data through a pipeline of independent components. The extractor reads from the source database and writes intermediate files to disk. The mover optionally transports those files to a location accessible by the target. The loader reads the intermediate files and applies them to the target database.

Source DB → Extractor → Files → Mover → Files → Loader → Target DB

The intermediate files are the decoupling point. Extractors know nothing about the target database and loaders know nothing about the source database. This separation means any supported source can be paired with any supported target without custom integration logic. In data mode, the orchestrator starts the extractor, mover, and loader in parallel. The extractor writes files continuously, the mover picks them up and transfers them, and the loader consumes them as they arrive. This pipeline architecture minimizes the storage footprint since files are loaded and cleared while extraction is still in progress.

Standalone Binaries

Each extractor and loader is compiled as its own standalone binary. The orchestrator invokes these binaries as child processes, passing each one a configuration file. The orchestrator constructs these child configuration files by extracting the relevant section from the main configuration. For example, source.data.dsnFile in the orchestrator config becomes simply dsnFile in the data extractor’s config. Because the binaries are standalone, they can also be run independently outside the orchestrator for testing or debugging purposes. Each binary takes a single argument: the path to its configuration file.

Binary Selection

The orchestrator uses an internal binary map to determine which binary to invoke for a given source, target, and mode combination. For example, a MySQL source in data mode invokes the MySQL data extractor binary, while a Snowflake target in data mode invokes the Snowflake data loader binary. The same orchestrator binary handles all source and target combinations.

Local and Remote Database Access

When Wirekite runs on the same host as the database (or shares a filesystem), it can use faster server-side file operations instead of streaming data through the client connection. The databaseRemote parameter controls this behavior.

databaseRemote

boolean

default:"true"

When true (the default), Wirekite streams data through the client connection. This works with any database regardless of where it is hosted, including cloud-managed databases like RDS, Cloud SQL, and Azure Database.When false, Wirekite uses server-side file operations where the database server reads or writes files directly on its local filesystem. This is faster but requires that Wirekite and the database share a filesystem.

How It Works

The mechanism differs between extractors and loaders: Extractors (source side):

Remote (databaseRemote=true): The extractor runs a query and streams the result set through the client connection, writing rows to files locally
Local (databaseRemote=false): The extractor instructs the database server to write query results directly to a file on the server’s filesystem

Loaders (target side):

Remote (databaseRemote=true): The loader reads files locally and streams the data to the database server through the client connection
Local (databaseRemote=false): The loader tells the database server to read files directly from its own filesystem

Database-Specific Mechanisms

The underlying SQL mechanism used depends on the database:

Database	Remote (default)	Local
MySQL Extractor	`SELECT` with client-side streaming	`SELECT ... INTO OUTFILE`
PostgreSQL Extractor	`COPY ... TO STDOUT`	`COPY ... TO '<filepath>'`
MySQL Loader	`LOAD DATA LOCAL INFILE`	`LOAD DATA INFILE`
PostgreSQL Loader	`COPY ... FROM STDIN`	`COPY ... FROM '<filepath>'`
SingleStore Loader	`LOAD DATA LOCAL INFILE`	`LOAD DATA INFILE`

Which Databases Support It

The databaseRemote parameter is only relevant for databases that have both a client-side and server-side file transfer mechanism:

Component	Supports databaseRemote
MySQL extractor	Yes
PostgreSQL extractor	Yes
Oracle extractor	No — always streams through the client
SQL Server extractor	No — always streams through the client
MySQL loader	Yes
PostgreSQL loader	Yes
SingleStore loader	Yes
Oracle loader	No — always streams through the client
SQL Server loader	No — uses native bulk loader

Cloud data warehouses (Snowflake, BigQuery, Databricks, Firebolt, Spanner) do not use databaseRemote. They have their own staging and upload mechanisms — for example, Snowflake uses an internal stage with PUT and COPY INTO, and BigQuery loads through Google Cloud Storage.

When to Use Each

For most deployments, the default (databaseRemote=true) is the right choice. Only set databaseRemote=false if Wirekite is running on the same host as the database server or they share a mounted filesystem.

Cloud-managed databases (Amazon RDS, Google Cloud SQL, Azure Database): Use databaseRemote=true. Server-side file access is not available on managed instances.
Self-hosted databases on the same host as Wirekite: Set databaseRemote=false for better performance through server-side file operations.
Self-hosted databases on a different host: Use databaseRemote=true. The database server cannot access files on the Wirekite host.

Thread Counts

Data extractors and data loaders process multiple tables concurrently using configurable thread counts.

maxThreads

integer

default:"5"

The number of concurrent threads used by a data extractor or data loader. Each thread processes one table at a time. All source extractors and all target loaders support this parameter.

The extractor and loader thread counts are configured independently. For example, the extractor can use 8 threads while the loader uses 4. The right values depend on the hardware, network bandwidth, and how much concurrent load the source and target databases can handle.

# Extractor threads
source.data.maxThreads=8

# Loader threads
target.data.maxThreads=4

Change extractors are single-threaded because they must read the transaction log sequentially to preserve commit ordering. Change loaders use a fixed internal thread pool for parallel merge operations across tables, but this is not user-configurable.

Handover: Transitioning to Change Replication

When data and change modes are combined in a single configuration, the orchestrator automatically handles the transition from bulk data loading to continuous change replication. After the data phase completes, a handover process captures the exact position in the source database’s transaction log. The change extractor then begins from that position, ensuring no data is missed or duplicated. Each source database type has its own position mechanism — MySQL uses binlog coordinates, PostgreSQL uses WAL LSN, Oracle uses SCN, and SQL Server uses LSN. For full details on change replication, handover mechanics, and crash recovery, see CDC Replication.

Introduction

Datatype Matrices

Source Guides

Data Movers

Target Guides

Running Wirekite

Tools

The Pipeline

Standalone Binaries

Binary Selection

Local and Remote Database Access

How It Works

Database-Specific Mechanisms

Which Databases Support It

When to Use Each

Thread Counts

Handover: Transitioning to Change Replication

Introduction

Datatype Matrices

Source Guides

Data Movers

Target Guides

Running Wirekite

Tools

​The Pipeline

​Standalone Binaries

​Binary Selection

​Local and Remote Database Access

​How It Works

​Database-Specific Mechanisms

​Which Databases Support It

​When to Use Each

​Thread Counts

​Handover: Transitioning to Change Replication

The Pipeline

Standalone Binaries

Binary Selection

Local and Remote Database Access

How It Works

Database-Specific Mechanisms

Which Databases Support It

When to Use Each

Thread Counts

Handover: Transitioning to Change Replication