Overview
Target Sync makes a target table set match a source table set in one shot, without CDC. It compares each table between a source and a target database, row by row, and applies exactly the inserts, updates, and deletes needed to bring the target into agreement with the source. Where Data Loading does a one-time bulk load into an empty target and CDC Replication streams ongoing changes continuously, Target Sync is the third mode: a one-shot diff-and-apply that reconciles a target that already holds data.Target Sync is one-directional: the source is the source of truth, and the target is reconciled to it. To sync the other way, swap source and target in a separate run.
When to use it
Drift repair
A target that has diverged from its source — a CDC gap, a manual edit, a load that stopped halfway — is re-converged exactly.
Backfill & top-up
Seed or top up a target that has no change feed, without standing up a continuous pipeline for a one-time job.
One-shot migration
Move a database pair where CDC is unavailable or not worth configuring. A single reconciling pass lands the data.
When not to use it
- Continuous replication — use CDC Replication (the streaming change extractor + change loader).
- Bidirectional sync — Target Sync is one-directional per run.
How It Works
Target Sync runs as two cooperating stages that communicate through a shared emit directory on local disk.| Stage | Binary | Role |
|---|---|---|
| 1. Emit | tablevalidator | Reads source + target, diffs each table in primary-key order, writes change records as N.ckt files, then drops a CHANGE.DONE sentinel. |
| 2. Apply | <target>/loader/change_loader | Watches the emit directory, applies each N.ckt to the target, moves applied files aside, and exits when it sees CHANGE.DONE. |
N.ckt files while the emitter is still producing later ones. They rendezvous on the CHANGE.DONE sentinel — when the loader logs CHANGE.DONE - exiting, the target matches the source.
Target Sync can be driven from the Wirekite GUI or API, which manage the run directory, config files, and both processes for you. The mechanics below describe what those interfaces orchestrate, and how to run it directly.
Supported Databases
| Role | Supported types |
|---|---|
| Source | postgres, mysql, mariadb, sqlserver, oracle, cassandra |
| Target | postgres, mysql, mariadb, singlestore, sqlserver, oracle, firebolt, snowflake, databricks, bigquery, spanner, mongodb |
Cassandra is a Target-Sync-only source. Cassandra has no CDC path in Wirekite — it participates as a source exclusively through Target Sync (snapshot + drift repair). See the Cassandra Source Guide and the Cassandra Datatype Matrix.
MariaDB and TigerData reuse other binaries. MariaDB reuses the MySQL change-loader and TigerData reuses PostgreSQL’s. When resolving the change-loader path, map
mariadb → mysql and tigerdata → postgres. The sourceType/targetType names in the configs stay as the real database type.Prerequisites
The target tables must already exist
Target Sync inserts, updates, and deletes rows; it does not create tables. Run schema migration first if the target is empty.
A schema file (.skt) must be available
The
.skt describes the tables (columns, types, primary keys). It is produced by the schema-migration step and consumed by both stages — they must be given the same .skt.Every table must have a primary key
UPDATE and DELETE records are keyed by primary key. The emitter refuses a table with no PK rather than emit unkeyed records.
Source and target must agree on PK case sensitivity
The sort-merge diff relies on a consistent key ordering. A case-insensitive source syncing to a case-sensitive target (or vice-versa) can mis-window keys. See Limitations.
Schema Migration
Target Sync does not create tables or produce the.skt — a one-time schema migration does both. Run it whenever the target tables don’t yet exist (or the source schema has changed). It has three logical steps, coordinated by the wirekite orchestrator:
- Extract the source schema →
wirekite_schema.sktvia the source’sschema_extractor. - Generate the target DDL from the
.skt—c.sql(CREATE TABLE, plus the_wkmmerge shadow tables),i.sql(constraints / primary keys),f.sql(foreign keys),d.sql(DROP TABLE) — via the target’sschema_loader. - Apply the generated DDL to the target to actually create the tables.
schema-extract mode performs steps 1–2 in one invocation. Step 3 (applying the SQL) is run separately — via the orchestrator’s apply modes or by executing the generated .sql files with your own DB client.
After schema migration completes, the target tables and the .skt are in place. Proceed to the emit and apply stages, passing the same wirekite_schema.skt and the same schemaRename you used during migration.
Stage 1 — Emit (TableValidator)
The emit stage runs the TableValidator in emit mode. It reads the source and target, diffs each table, and writes change records to the emit directory.Configuration
Must be
true to enable Target Sync emit. Otherwise the binary runs in plain validation mode.Directory where
N.ckt files and the CHANGE.DONE sentinel are written. Created if absent. The apply stage reads from here.One of the supported source types (see Supported Databases).
Path to a file containing the source DSN.
One of the supported target types.
Path to a file containing the target DSN.
Path to the
.skt schema file. Must be the same file given to the apply stage.Path to a file listing tables to sync, one
schema.table per line (source-side names).sourceSchema:targetSchema — rewrites the schema name in emitted records (e.g. dbo:app_schema). Required when the source and target schema names differ, and must match the value used during schema migration.Rows per diff window (e.g.
10000). Controls memory and parallel granularity. Recommended.Maximum concurrent table-window comparisons. Bound it by the source and target connection limits. Recommended.
Target size of each consolidated
N.ckt file. Larger files amortize the loader’s per-file cost (most impactful for cloud targets).Emit log path. Recommended.
Enables emit resume — records each emitted window so a restarted emitter skips finished work. See Resume & Crash Recovery.
When
true, resumes from checkpointDir instead of clearing prior output.GCP credentials JSON. Required for Spanner and BigQuery targets (the emitter reads target DELETE primary keys from them).
Cassandra source options
When
true, uses the bounded (pk, hash) diff — projecting each row to a primary key plus a row hash rather than materializing whole tables. Recommended for Cassandra sources. Applies to single-PK tables; composite PKs fall back to a whole-table sort automatically.In-memory ceiling for the
(pk, hash) projection before it spills to a k-way merge. Trade RAM for spill I/O.In-memory ceiling for the whole-row sort (the composite-PK fallback and initial data migration).
Output
| File | Meaning |
|---|---|
0.ckt, 1.ckt, … | Gapless, ascending change-record files. |
CHANGE.DONE | Zero-byte sentinel written after all N.ckt are finalized. Its presence means emit is complete. |
part_*.part, building.tmp | Transient intermediates — ignore them. |
target-sync emit complete: N CKT file(s) … + CHANGE.DONE.
Stage 2 — Apply (Change Loader)
The apply stage runs the target’s change loader. It drainsN.ckt files from the emit directory and applies them to the target.
Resolve the binary directory with the
mariadb → mysql and tigerdata → postgres mapping. For example, a MariaDB target uses mysql/loader/change_loader.Configuration
Path to a file containing the target DSN.
The same
.skt given to the emitter.The emitter’s
emitOutputDirectory.Where applied
N.ckt files are moved after success.Scratch directory for the loader.
Must be
false — the emitter base64-encodes string and binary values.Maximum concurrent table applies. Recommended.
Apply log path. Recommended.
GCP credentials JSON. Required for Spanner and BigQuery targets.
Staging bucket for the loader’s
COPY. Required for Firebolt and Databricks targets (they stage CSVs to S3). Pair with awsRegion and awsCredentials / awsCredentialsFile.Behavior
- Polls
inputDirectoryfor the next expectedN.ckt, applies it to the target, and moves it todoneDirectory. - Relational targets apply via MERGE/upsert plus delete; cloud targets (Firebolt, Snowflake, Databricks, BigQuery) stage rows to their bucket and bulk-load.
- Exits cleanly when it has applied through the last file and seen
CHANGE.DONE(log lineCHANGE.DONE - exiting).
Change Record Format
EachN.ckt file is UTF-8 text, tab-delimited, one change record per line. The format is byte-identical to the CDC change extractor’s, so the change-loader applies it unchanged.
I(insert) — a row present in the source but missing from the target; all column values in schema order.U(update) — a row whose PK matches but whose columns differ; the PK followed by sparse(offset, value)pairs, whereoffsetis the 0-based column position. Only changed columns appear.D(delete) — a row present in the target but absent from the source; primary-key columns only.
\N for NULL. Each record ends with a trailing tab then newline (\t\n).
End-to-End Example
A completesqlserver → mysql run, chaining schema migration → emit → apply. It assumes the Wirekite binaries are built and the two DSN files exist.
Cloud targets (Firebolt, Snowflake, Databricks, BigQuery, Spanner) need the extra staging/credential keys in
apply.cfg, and gcpCredfile for the schema phase. See the Stage 2 configuration.Resume & Crash Recovery
Target Sync is restartable at every stage.- Emit resume. Set
checkpointDir(typically the emit dir) andresume=true. The emitter records each completed window; a restarted run skips already-emitted windows and preserves published*.cktfiles. - Emit-once across crashes. Within a window, the emitter writes a per-file high-water sidecar (
N.ckt.hw, fsync’d beforeN.cktis published) recording how far that window’s PK-ordered emit reached. On resume it continues past the high-water, so every difference lands in exactly one committed.ckt— even if the host crashed mid-emit, and with no reliance on loader idempotency (required for non-MERGE loaders such as Spanner). - Apply recovery. The change-loader checkpoints applied files; on restart it resumes at the next un-applied
N.ckt. Files already moved todoneDirectoryare not re-applied. - Gapless invariant. A fresh emit run clears stale
*.ckt/CHANGE.DONEfirst, so the loader never sees a leftover higher-numbered file or a prematureCHANGE.DONE.
Performance & Tuning
windowSize— larger windows reduce per-window overhead but raise memory per concurrent comparison.maxThreads— raises concurrency across table-windows; bound it by source and target connection limits.emitFileTargetMB— largerN.cktfiles reduce the loader’s per-file COPY/upload overhead (most impactful for cloud targets).- SQL Server source — the validator reads SQL Server with a 32 KB TDS packet size, so wide tables transfer several times faster than the 4 KB default. This is automatic.
- Wide-table syncs are dominated by raw data transfer; expect throughput in the tens of MB/s per stream over typical intra-cloud networking.
- Cassandra source — keep
cassandraPKHashValidation=trueso the footprint stays bounded, and tunecassandraPKHashMemRowsto trade RAM for spill I/O. The diff/emit cost is bounded by the drift (the rows that differ), not the table size; at large base size the source ring’s read throughput is the ceiling.
Limitations & Correctness Rules
- One-directional. Reconciles the target to the source’s state. To sync the other way, swap source and target in a separate run.
- Primary key required on every table.
- Target schema must pre-exist. Target Sync moves data, not DDL — run schema migration first.
- Case-sensitivity must align. Source and target must agree on PK case sensitivity. A case-insensitive source → case-sensitive target is fine; a case-sensitive source → case-insensitive target can lose case-distinct keys (the target folds them) and is unsupported. For mixed collations, ensure PK data is single-case or that the target PK columns use a binary collation.
- Type coverage is source-driven. Only the supported source types can encode full-row INSERT/UPDATE values.
Related
Cassandra Source Guide
Configure Apache Cassandra as a Target-Sync source.
Cassandra Datatype Matrix
How CQL types map onto each relational and warehouse target.
TableValidator
The diff engine behind the emit stage, also usable for standalone validation.
CDC Replication
Continuous change capture — the streaming alternative to a one-shot sync.
