The Pipeline
Wirekite moves data through a pipeline of independent components. The extractor reads from the source database and writes intermediate files to disk. The mover optionally transports those files to a location accessible by the target. The loader reads the intermediate files and applies them to the target database.Standalone Binaries
Each extractor and loader is compiled as its own standalone binary. The orchestrator invokes these binaries as child processes, passing each one a configuration file. The orchestrator constructs these child configuration files by extracting the relevant section from the main configuration. For example,source.data.dsnFile in the orchestrator config becomes simply dsnFile in the data extractor’s config.
Because the binaries are standalone, they can also be run independently outside the orchestrator for testing or debugging purposes. Each binary takes a single argument: the path to its configuration file.
Binary Selection
The orchestrator uses an internal binary map to determine which binary to invoke for a given source, target, and mode combination. For example, a MySQL source in data mode invokes the MySQL data extractor binary, while a Snowflake target in data mode invokes the Snowflake data loader binary. The same orchestrator binary handles all source and target combinations.Local and Remote Database Access
When Wirekite runs on the same host as the database (or shares a filesystem), it can use faster server-side file operations instead of streaming data through the client connection. ThedatabaseRemote parameter controls this behavior.
When
true (the default), Wirekite streams data through the client connection. This works with any database regardless of where it is hosted, including cloud-managed databases like RDS, Cloud SQL, and Azure Database.When false, Wirekite uses server-side file operations where the database server reads or writes files directly on its local filesystem. This is faster but requires that Wirekite and the database share a filesystem.How It Works
The mechanism differs between extractors and loaders: Extractors (source side):- Remote (
databaseRemote=true): The extractor runs a query and streams the result set through the client connection, writing rows to files locally - Local (
databaseRemote=false): The extractor instructs the database server to write query results directly to a file on the server’s filesystem
- Remote (
databaseRemote=true): The loader reads files locally and streams the data to the database server through the client connection - Local (
databaseRemote=false): The loader tells the database server to read files directly from its own filesystem
Database-Specific Mechanisms
The underlying SQL mechanism used depends on the database:| Database | Remote (default) | Local |
|---|---|---|
| MySQL Extractor | SELECT with client-side streaming | SELECT ... INTO OUTFILE |
| PostgreSQL Extractor | COPY ... TO STDOUT | COPY ... TO '<filepath>' |
| MySQL Loader | LOAD DATA LOCAL INFILE | LOAD DATA INFILE |
| PostgreSQL Loader | COPY ... FROM STDIN | COPY ... FROM '<filepath>' |
| SingleStore Loader | LOAD DATA LOCAL INFILE | LOAD DATA INFILE |
Which Databases Support It
ThedatabaseRemote parameter is only relevant for databases that have both a client-side and server-side file transfer mechanism:
| Component | Supports databaseRemote |
|---|---|
| MySQL extractor | Yes |
| PostgreSQL extractor | Yes |
| Oracle extractor | No — always streams through the client |
| SQL Server extractor | No — always streams through the client |
| MySQL loader | Yes |
| PostgreSQL loader | Yes |
| SingleStore loader | Yes |
| Oracle loader | No — always streams through the client |
| SQL Server loader | No — uses native bulk loader |
databaseRemote. They have their own staging and upload mechanisms — for example, Snowflake uses an internal stage with PUT and COPY INTO, and BigQuery loads through Google Cloud Storage.
When to Use Each
For most deployments, the default (
databaseRemote=true) is the right choice. Only set databaseRemote=false if Wirekite is running on the same host as the database server or they share a mounted filesystem.- Cloud-managed databases (Amazon RDS, Google Cloud SQL, Azure Database): Use
databaseRemote=true. Server-side file access is not available on managed instances. - Self-hosted databases on the same host as Wirekite: Set
databaseRemote=falsefor better performance through server-side file operations. - Self-hosted databases on a different host: Use
databaseRemote=true. The database server cannot access files on the Wirekite host.
Thread Counts
Data extractors and data loaders process multiple tables concurrently using configurable thread counts.The number of concurrent threads used by a data extractor or data loader. Each thread processes one table at a time. All source extractors and all target loaders support this parameter.
