Target Sync - Wirekite Docs

Overview

Target Sync makes a target table set match a source table set in one shot, without CDC. It compares each table between a source and a target database, row by row, and applies exactly the inserts, updates, and deletes needed to bring the target into agreement with the source. Where Data Loading does a one-time bulk load into an empty target and CDC Replication streams ongoing changes continuously, Target Sync is the third mode: a one-shot diff-and-apply that reconciles a target that already holds data.

Target Sync is one-directional: the source is the source of truth, and the target is reconciled to it. To sync the other way, swap source and target in a separate run.

Internally, Target Sync reuses the TableValidator’s sort-merge diff walk. Instead of only reporting differences, it emits them as change records that a change-loader applies to the target — so the change records are byte-identical to the ones CDC produces, and the same target loaders apply them.

When to use it

Drift repair

A target that has diverged from its source — a CDC gap, a manual edit, a load that stopped halfway — is re-converged exactly.

Backfill & top-up

Seed or top up a target that has no change feed, without standing up a continuous pipeline for a one-time job.

One-shot migration

Move a database pair where CDC is unavailable or not worth configuring. A single reconciling pass lands the data.

When not to use it

Continuous replication — use CDC Replication (the streaming change extractor + change loader).
Bidirectional sync — Target Sync is one-directional per run.

How It Works

Target Sync runs as two cooperating stages that communicate through a shared emit directory on local disk.

            ┌───────────┐   read + diff   ┌──────────┐
   SOURCE ─▶│   EMIT    │◀────────────────│  TARGET  │
 (truth)    │ (Table-   │   read PKs      └──────────┘
            │ Validator)│                       ▲
            └─────┬─────┘                       │ apply I/U/D
                  │ writes N.ckt files          │
                  ▼                             │
           emit directory  ──────────────▶ ┌──────────┐
           (0.ckt, 1.ckt, …,               │  APPLY   │
            CHANGE.DONE)                    │ (change  │
                                            │  loader) │
                                            └──────────┘

Stage	Binary	Role
1. Emit	`tablevalidator`	Reads source + target, diffs each table in primary-key order, writes change records as `N.ckt` files, then drops a `CHANGE.DONE` sentinel.
2. Apply	`<target>/loader/change_loader`	Watches the emit directory, applies each `N.ckt` to the target, moves applied files aside, and exits when it sees `CHANGE.DONE`.

The two stages run concurrently: the loader begins applying early N.ckt files while the emitter is still producing later ones. They rendezvous on the CHANGE.DONE sentinel — when the loader logs CHANGE.DONE - exiting, the target matches the source.

Target Sync can be driven from the Wirekite GUI or API, which manage the run directory, config files, and both processes for you. The mechanics below describe what those interfaces orchestrate, and how to run it directly.

Supported Databases

Role	Supported types
Source	`postgres`, `mysql`, `mariadb`, `sqlserver`, `oracle`, `cassandra`
Target	`postgres`, `mysql`, `mariadb`, `singlestore`, `sqlserver`, `oracle`, `firebolt`, `snowflake`, `databricks`, `bigquery`, `spanner`, `mongodb`

The source set is smaller because the source drives full INSERT/UPDATE row encoding (it needs complete column-type coverage). The target only supplies primary keys for DELETE detection, so more target types are supported. The emitter fails loud at startup on an unsupported source or target type.

Cassandra is a Target-Sync-only source. Cassandra has no CDC path in Wirekite — it participates as a source exclusively through Target Sync (snapshot + drift repair). See the Cassandra Source Guide and the Cassandra Datatype Matrix.

MariaDB and TigerData reuse other binaries. MariaDB reuses the MySQL change-loader and TigerData reuses PostgreSQL’s. When resolving the change-loader path, map mariadb → mysql and tigerdata → postgres. The sourceType/targetType names in the configs stay as the real database type.

Prerequisites

The target tables must already exist

Target Sync inserts, updates, and deletes rows; it does not create tables. Run schema migration first if the target is empty.

A schema file (.skt) must be available

The .skt describes the tables (columns, types, primary keys). It is produced by the schema-migration step and consumed by both stages — they must be given the same .skt.

Every table must have a primary key

UPDATE and DELETE records are keyed by primary key. The emitter refuses a table with no PK rather than emit unkeyed records.

Source and target must agree on PK case sensitivity

The sort-merge diff relies on a consistent key ordering. A case-insensitive source syncing to a case-sensitive target (or vice-versa) can mis-window keys. See Limitations.

Schema Migration

Target Sync does not create tables or produce the .skt — a one-time schema migration does both. Run it whenever the target tables don’t yet exist (or the source schema has changed). It has three logical steps, coordinated by the wirekite orchestrator:

Extract the source schema → wirekite_schema.skt via the source’s schema_extractor.
Generate the target DDL from the .skt — c.sql (CREATE TABLE, plus the _wkm merge shadow tables), i.sql (constraints / primary keys), f.sql (foreign keys), d.sql (DROP TABLE) — via the target’s schema_loader.
Apply the generated DDL to the target to actually create the tables.

The orchestrator’s schema-extract mode performs steps 1–2 in one invocation. Step 3 (applying the SQL) is run separately — via the orchestrator’s apply modes or by executing the generated .sql files with your own DB client.

Create the target schema with createMergeTables=true. The change-loader applies change records through per-table _wkm merge shadow tables; those must exist before the apply stage runs. The schema loader creates them when this flag is set.

After schema migration completes, the target tables and the .skt are in place. Proceed to the emit and apply stages, passing the same wirekite_schema.skt and the same schemaRename you used during migration.

Stage 1 — Emit (TableValidator)

The emit stage runs the TableValidator in emit mode. It reads the source and target, diffs each table, and writes change records to the emit directory.

tablevalidator <emit-config-file>

Configuration

emitMode

boolean

required

Must be true to enable Target Sync emit. Otherwise the binary runs in plain validation mode.

emitOutputDirectory

string

required

Directory where N.ckt files and the CHANGE.DONE sentinel are written. Created if absent. The apply stage reads from here.

sourceType

string

required

One of the supported source types (see Supported Databases).

sourceDsnFile

string

required

Path to a file containing the source DSN.

targetType

string

required

One of the supported target types.

targetDsnFile

string

required

Path to a file containing the target DSN.

schemaFile

string

required

Path to the .skt schema file. Must be the same file given to the apply stage.

tablesFile

string

required

Path to a file listing tables to sync, one schema.table per line (source-side names).

schemaRename

string

sourceSchema:targetSchema — rewrites the schema name in emitted records (e.g. dbo:app_schema). Required when the source and target schema names differ, and must match the value used during schema migration.

windowSize

integer

Rows per diff window (e.g. 10000). Controls memory and parallel granularity. Recommended.

maxThreads

integer

Maximum concurrent table-window comparisons. Bound it by the source and target connection limits. Recommended.

emitFileTargetMB

integer

default:"32"

Target size of each consolidated N.ckt file. Larger files amortize the loader’s per-file cost (most impactful for cloud targets).

logFile

string

Emit log path. Recommended.

checkpointDir

string

Enables emit resume — records each emitted window so a restarted emitter skips finished work. See Resume & Crash Recovery.

resume

boolean

default:"false"

When true, resumes from checkpointDir instead of clearing prior output.

gcpCredfile

string

GCP credentials JSON. Required for Spanner and BigQuery targets (the emitter reads target DELETE primary keys from them).

Cassandra source options

cassandraPKHashValidation

boolean

When true, uses the bounded (pk, hash) diff — projecting each row to a primary key plus a row hash rather than materializing whole tables. Recommended for Cassandra sources. Applies to single-PK tables; composite PKs fall back to a whole-table sort automatically.

cassandraPKHashMemRows

integer

In-memory ceiling for the (pk, hash) projection before it spills to a k-way merge. Trade RAM for spill I/O.

cassandraSortMemRows

integer

In-memory ceiling for the whole-row sort (the composite-PK fallback and initial data migration).

Output

File	Meaning
`0.ckt`, `1.ckt`, …	Gapless, ascending change-record files.
`CHANGE.DONE`	Zero-byte sentinel written after all `N.ckt` are finalized. Its presence means emit is complete.
`part_*.part`, `building.tmp`	Transient intermediates — ignore them.

In emit mode the per-table summary is worded DIFF (not FAIL) — differences are the expected product of a sync, not a validation failure. The emitter prints a final line: target-sync emit complete: N CKT file(s) … + CHANGE.DONE.

Stage 2 — Apply (Change Loader)

The apply stage runs the target’s change loader. It drains N.ckt files from the emit directory and applies them to the target.

<target>/loader/change_loader <apply-config-file>

Resolve the binary directory with the mariadb → mysql and tigerdata → postgres mapping. For example, a MariaDB target uses mysql/loader/change_loader.

Configuration

dsnFile

string

required

Path to a file containing the target DSN.

schemaFile

string

required

The same .skt given to the emitter.

inputDirectory

string

required

The emitter’s emitOutputDirectory.

doneDirectory

string

required

Where applied N.ckt files are moved after success.

workDirectory

string

required

Scratch directory for the loader.

hexEncoding

boolean

required

Must be false — the emitter base64-encodes string and binary values.

maxThreads

integer

Maximum concurrent table applies. Recommended.

logFile

string

Apply log path. Recommended.

gcpCredfile

string

GCP credentials JSON. Required for Spanner and BigQuery targets.

awsBucket

string

Staging bucket for the loader’s COPY. Required for Firebolt and Databricks targets (they stage CSVs to S3). Pair with awsRegion and awsCredentials / awsCredentialsFile.

Behavior

Polls inputDirectory for the next expected N.ckt, applies it to the target, and moves it to doneDirectory.
Relational targets apply via MERGE/upsert plus delete; cloud targets (Firebolt, Snowflake, Databricks, BigQuery) stage rows to their bucket and bulk-load.
Exits cleanly when it has applied through the last file and seen CHANGE.DONE (log line CHANGE.DONE - exiting).

Change Record Format

Each N.ckt file is UTF-8 text, tab-delimited, one change record per line. The format is byte-identical to the CDC change extractor’s, so the change-loader applies it unchanged.

I  <schema>  <table>  <col1> <col2> … <colN>           # INSERT: all columns, in schema order
U  <schema>  <table>  <pk…>  <off> <val> <off> <val>…  # UPDATE: PK, then (column-offset, value) pairs for changed columns only
D  <schema>  <table>  <pk…>                            # DELETE: primary-key columns only

I (insert) — a row present in the source but missing from the target; all column values in schema order.
U (update) — a row whose PK matches but whose columns differ; the PK followed by sparse (offset, value) pairs, where offset is the 0-based column position. Only changed columns appear.
D (delete) — a row present in the target but absent from the source; primary-key columns only.

Values are encoded on the wire as: base64 for strings and binary/blob, ISO-8601 for timestamps, and the literal \N for NULL. Each record ends with a trailing tab then newline (\t\n).

End-to-End Example

A complete sqlserver → mysql run, chaining schema migration → emit → apply. It assumes the Wirekite binaries are built and the two DSN files exist.

# ---- Setup ----
WK=/home/wirekite                       # Wirekite home (built binaries live here)
WORK=/var/wirekite/sync_run_42
EMIT=$WORK/emit
DONE=$WORK/done
mkdir -p "$WORK" "$EMIT" "$DONE"

SRC_DSN=/etc/wirekite/dsn/source_sqlserver.dsn
TGT_DSN=/etc/wirekite/dsn/target_mysql.dsn

# Tables to sync — SOURCE-side schema.table, one per line
cat > "$WORK/tables.txt" <<EOF
dbo.test_strings
dbo.test_alltypes
dbo.bulk_001
EOF

# ============================================================
# PHASE 1 — Schema migration: create target tables + produce .skt
# ============================================================
cat > "$WORK/schema.cfg" <<EOF
mode=schema-extract
home=$WK
source=sqlserver
target=mysql

source.schema.dsnFile=$SRC_DSN
source.schema.tablesFile=$WORK/tables.txt
source.schema.outputDirectory=$WORK
source.schema.logFile=$WORK/schema_extractor.log
source.schema.schemaRename=dbo:app_schema

target.schema.schemaFile=$WORK/wirekite_schema.skt
target.schema.logFile=$WORK/schema_loader.log
target.schema.createTableFile=$WORK/c.sql
target.schema.createConstraintFile=$WORK/i.sql
target.schema.createForeignKeyFile=$WORK/f.sql
target.schema.dropTableFile=$WORK/d.sql
target.schema.createMergeTables=true
EOF

# 1a. Extract source schema -> wirekite_schema.skt, and generate the DDL files.
"$WK/orchestrator/wirekite" "$WORK/schema.cfg"

# 1b. Apply the generated DDL to the target: drop, create, constraints, foreign keys.
#     (drop and FK steps are best-effort; -ignore-errors lets them pass.)
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" -ignore-errors < "$WORK/d.sql"
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN"                < "$WORK/c.sql"
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN"                < "$WORK/i.sql"
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" -ignore-errors < "$WORK/f.sql"
# The target tables (and their _wkm merge shadow tables) now exist.

# ============================================================
# PHASE 2 + 3 — Emit + Apply (run concurrently; reuse the SAME .skt + schemaRename)
# ============================================================
cat > "$WORK/emit.cfg" <<EOF
emitMode=true
emitOutputDirectory=$EMIT
sourceType=sqlserver
sourceDsnFile=$SRC_DSN
targetType=mysql
targetDsnFile=$TGT_DSN
schemaFile=$WORK/wirekite_schema.skt
tablesFile=$WORK/tables.txt
schemaRename=dbo:app_schema
windowSize=10000
maxThreads=8
emitFileTargetMB=32
logFile=$WORK/emit.log
EOF

cat > "$WORK/apply.cfg" <<EOF
dsnFile=$TGT_DSN
schemaFile=$WORK/wirekite_schema.skt
inputDirectory=$EMIT
doneDirectory=$DONE
workDirectory=$WORK
logFile=$WORK/apply.log
maxThreads=8
hexEncoding=false
EOF

# Loader drains N.ckt files as the emitter writes them (mysql == mapped binary dir).
"$WK/mysql/loader/change_loader" "$WORK/apply.cfg" &
"$WK/tablevalidator/tablevalidator" "$WORK/emit.cfg"
wait
# Done: when the loader logs "CHANGE.DONE - exiting", the target matches the source.

Cloud targets (Firebolt, Snowflake, Databricks, BigQuery, Spanner) need the extra staging/credential keys in apply.cfg, and gcpCredfile for the schema phase. See the Stage 2 configuration.

Resume & Crash Recovery

Target Sync is restartable at every stage.

Emit resume. Set checkpointDir (typically the emit dir) and resume=true. The emitter records each completed window; a restarted run skips already-emitted windows and preserves published *.ckt files.
Emit-once across crashes. Within a window, the emitter writes a per-file high-water sidecar (N.ckt.hw, fsync’d before N.ckt is published) recording how far that window’s PK-ordered emit reached. On resume it continues past the high-water, so every difference lands in exactly one committed .ckt — even if the host crashed mid-emit, and with no reliance on loader idempotency (required for non-MERGE loaders such as Spanner).
Apply recovery. The change-loader checkpoints applied files; on restart it resumes at the next un-applied N.ckt. Files already moved to doneDirectory are not re-applied.
Gapless invariant. A fresh emit run clears stale *.ckt / CHANGE.DONE first, so the loader never sees a leftover higher-numbered file or a premature CHANGE.DONE.

Do not hand-place files in the emit directory. The loader applies N.ckt files in strict ascending order and treats CHANGE.DONE as the completion signal; a stray file breaks the gapless invariant.

Performance & Tuning

windowSize — larger windows reduce per-window overhead but raise memory per concurrent comparison.
maxThreads — raises concurrency across table-windows; bound it by source and target connection limits.
emitFileTargetMB — larger N.ckt files reduce the loader’s per-file COPY/upload overhead (most impactful for cloud targets).
SQL Server source — the validator reads SQL Server with a 32 KB TDS packet size, so wide tables transfer several times faster than the 4 KB default. This is automatic.
Wide-table syncs are dominated by raw data transfer; expect throughput in the tens of MB/s per stream over typical intra-cloud networking.
Cassandra source — keep cassandraPKHashValidation=true so the footprint stays bounded, and tune cassandraPKHashMemRows to trade RAM for spill I/O. The diff/emit cost is bounded by the drift (the rows that differ), not the table size; at large base size the source ring’s read throughput is the ceiling.

Limitations & Correctness Rules

One-directional. Reconciles the target to the source’s state. To sync the other way, swap source and target in a separate run.
Primary key required on every table.
Target schema must pre-exist. Target Sync moves data, not DDL — run schema migration first.
Case-sensitivity must align. Source and target must agree on PK case sensitivity. A case-insensitive source → case-sensitive target is fine; a case-sensitive source → case-insensitive target can lose case-distinct keys (the target folds them) and is unsupported. For mixed collations, ensure PK data is single-case or that the target PK columns use a binary collation.
Type coverage is source-driven. Only the supported source types can encode full-row INSERT/UPDATE values.

Cassandra Source Guide

Configure Apache Cassandra as a Target-Sync source.

Cassandra Datatype Matrix

How CQL types map onto each relational and warehouse target.

TableValidator

The diff engine behind the emit stage, also usable for standalone validation.

CDC Replication

Continuous change capture — the streaming alternative to a one-shot sync.

​Overview

​When to use it

Drift repair

Backfill & top-up

One-shot migration

​When not to use it

​How It Works

​Supported Databases

​Prerequisites

​Schema Migration

​Stage 1 — Emit (TableValidator)

​Configuration

​Cassandra source options

​Output

​Stage 2 — Apply (Change Loader)

​Configuration

​Behavior

​Change Record Format

​End-to-End Example

​Resume & Crash Recovery

​Performance & Tuning

​Limitations & Correctness Rules

​Related

Cassandra Source Guide

Cassandra Datatype Matrix

TableValidator

CDC Replication

Overview

When to use it

When not to use it

How It Works

Supported Databases

Prerequisites

Schema Migration

Stage 1 — Emit (TableValidator)

Configuration

Cassandra source options

Output

Stage 2 — Apply (Change Loader)

Configuration

Behavior

Change Record Format

End-to-End Example

Resume & Crash Recovery

Performance & Tuning

Limitations & Correctness Rules

Related