> ## Documentation Index > Fetch the complete documentation index at: https://docs.wirekite.io/llms.txt > Use this file to discover all available pages before exploring further. # Target Sync > Reconcile an already-populated target back to its source in a single pass — Wirekite diffs the two databases and applies only the differences, with no CDC. ## Overview **Target Sync makes a target table set match a source table set in one shot, without CDC.** It compares each table between a source and a target database, row by row, and applies exactly the inserts, updates, and deletes needed to bring the **target** into agreement with the **source**. Where [Data Loading](/run/dataloading) does a one-time bulk load into an empty target and [CDC Replication](/run/replication) streams ongoing changes continuously, Target Sync is the third mode: a **one-shot diff-and-apply** that reconciles a target that already holds data. Target Sync is **one-directional**: the source is the source of truth, and the target is reconciled to it. To sync the other way, swap source and target in a separate run. Internally, Target Sync reuses the [TableValidator's](/run/validation) sort-merge diff walk. Instead of only *reporting* differences, it *emits* them as change records that a change-loader applies to the target — so the change records are byte-identical to the ones CDC produces, and the same target loaders apply them. ### When to use it A target that has diverged from its source — a CDC gap, a manual edit, a load that stopped halfway — is re-converged exactly. Seed or top up a target that has no change feed, without standing up a continuous pipeline for a one-time job. Move a database pair where CDC is unavailable or not worth configuring. A single reconciling pass lands the data. ### When *not* to use it * **Continuous replication** — use [CDC Replication](/run/replication) (the streaming change extractor + change loader). * **Bidirectional sync** — Target Sync is one-directional per run. ## How It Works Target Sync runs as two cooperating stages that communicate through a shared **emit directory** on local disk. ``` ┌───────────┐ read + diff ┌──────────┐ SOURCE ─▶│ EMIT │◀────────────────│ TARGET │ (truth) │ (Table- │ read PKs └──────────┘ │ Validator)│ ▲ └─────┬─────┘ │ apply I/U/D │ writes N.ckt files │ ▼ │ emit directory ──────────────▶ ┌──────────┐ (0.ckt, 1.ckt, …, │ APPLY │ CHANGE.DONE) │ (change │ │ loader) │ └──────────┘ ``` | Stage | Binary | Role | | ------------ | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | **1. Emit** | `tablevalidator` | Reads source + target, diffs each table in primary-key order, writes change records as `N.ckt` files, then drops a `CHANGE.DONE` sentinel. | | **2. Apply** | `/loader/change_loader` | Watches the emit directory, applies each `N.ckt` to the target, moves applied files aside, and exits when it sees `CHANGE.DONE`. | The two stages run **concurrently**: the loader begins applying early `N.ckt` files while the emitter is still producing later ones. They rendezvous on the `CHANGE.DONE` sentinel — when the loader logs `CHANGE.DONE - exiting`, the target matches the source. Target Sync can be driven from the Wirekite **GUI** or **API**, which manage the run directory, config files, and both processes for you. The mechanics below describe what those interfaces orchestrate, and how to run it directly. ## Supported Databases | Role | Supported types | | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | **Source** | `postgres`, `mysql`, `mariadb`, `sqlserver`, `oracle`, `cassandra` | | **Target** | `postgres`, `mysql`, `mariadb`, `singlestore`, `sqlserver`, `oracle`, `firebolt`, `snowflake`, `databricks`, `bigquery`, `spanner`, `mongodb` | The source set is smaller because the source drives full INSERT/UPDATE row encoding (it needs complete column-type coverage). The target only supplies primary keys for DELETE detection, so more target types are supported. The emitter **fails loud** at startup on an unsupported source or target type. **Cassandra is a Target-Sync-only source.** Cassandra has no CDC path in Wirekite — it participates as a source exclusively through Target Sync (snapshot + drift repair). See the [Cassandra Source Guide](/sourceguides/cassandra) and the [Cassandra Datatype Matrix](/datatypes/cassandra). **MariaDB and TigerData reuse other binaries.** MariaDB reuses the MySQL change-loader and TigerData reuses PostgreSQL's. When resolving the change-loader path, map `mariadb → mysql` and `tigerdata → postgres`. The `sourceType`/`targetType` names in the configs stay as the real database type. ## Prerequisites Target Sync inserts, updates, and deletes rows; it does **not** create tables. Run [schema migration](#schema-migration) first if the target is empty. The `.skt` describes the tables (columns, types, primary keys). It is produced by the schema-migration step and consumed by **both** stages — they must be given the *same* `.skt`. UPDATE and DELETE records are keyed by primary key. The emitter refuses a table with no PK rather than emit unkeyed records. The sort-merge diff relies on a consistent key ordering. A case-insensitive source syncing to a case-sensitive target (or vice-versa) can mis-window keys. See [Limitations](#limitations--correctness-rules). ### Schema Migration Target Sync does **not** create tables or produce the `.skt` — a one-time **schema migration** does both. Run it whenever the target tables don't yet exist (or the source schema has changed). It has three logical steps, coordinated by the `wirekite` orchestrator: 1. **Extract** the source schema → `wirekite_schema.skt` via the source's `schema_extractor`. 2. **Generate** the target DDL from the `.skt` — `c.sql` (CREATE TABLE, plus the `_wkm` merge shadow tables), `i.sql` (constraints / primary keys), `f.sql` (foreign keys), `d.sql` (DROP TABLE) — via the target's `schema_loader`. 3. **Apply** the generated DDL to the target to actually create the tables. The orchestrator's `schema-extract` mode performs steps 1–2 in one invocation. Step 3 (applying the SQL) is run separately — via the orchestrator's apply modes or by executing the generated `.sql` files with your own DB client. **Create the target schema with `createMergeTables=true`.** The change-loader applies change records through per-table `_wkm` merge shadow tables; those must exist before the apply stage runs. The schema loader creates them when this flag is set. After schema migration completes, the target tables and the `.skt` are in place. Proceed to the emit and apply stages, passing the **same** `wirekite_schema.skt` and the **same** `schemaRename` you used during migration. ## Stage 1 — Emit (TableValidator) The emit stage runs the TableValidator in emit mode. It reads the source and target, diffs each table, and writes change records to the emit directory. ```bash theme={null} tablevalidator ``` ### Configuration Must be `true` to enable Target Sync emit. Otherwise the binary runs in plain [validation](/run/validation) mode. Directory where `N.ckt` files and the `CHANGE.DONE` sentinel are written. Created if absent. The apply stage reads from here. One of the supported source types (see [Supported Databases](#supported-databases)). Path to a file containing the source DSN. One of the supported target types. Path to a file containing the target DSN. Path to the `.skt` schema file. Must be the same file given to the apply stage. Path to a file listing tables to sync, one `schema.table` per line (source-side names). `sourceSchema:targetSchema` — rewrites the schema name in emitted records (e.g. `dbo:app_schema`). Required when the source and target schema names differ, and must match the value used during schema migration. Rows per diff window (e.g. `10000`). Controls memory and parallel granularity. Recommended. Maximum concurrent table-window comparisons. Bound it by the source and target connection limits. Recommended. Target size of each consolidated `N.ckt` file. Larger files amortize the loader's per-file cost (most impactful for cloud targets). Emit log path. Recommended. Enables emit resume — records each emitted window so a restarted emitter skips finished work. See [Resume & Crash Recovery](#resume--crash-recovery). When `true`, resumes from `checkpointDir` instead of clearing prior output. GCP credentials JSON. Required for Spanner and BigQuery targets (the emitter reads target DELETE primary keys from them). #### Cassandra source options When `true`, uses the bounded `(pk, hash)` diff — projecting each row to a primary key plus a row hash rather than materializing whole tables. Recommended for Cassandra sources. Applies to single-PK tables; composite PKs fall back to a whole-table sort automatically. In-memory ceiling for the `(pk, hash)` projection before it spills to a k-way merge. Trade RAM for spill I/O. In-memory ceiling for the whole-row sort (the composite-PK fallback and initial data migration). ### Output | File | Meaning | | ----------------------------- | ---------------------------------------------------------------------------------------------------- | | `0.ckt`, `1.ckt`, … | Gapless, ascending change-record files. | | `CHANGE.DONE` | Zero-byte sentinel written **after** all `N.ckt` are finalized. Its presence means emit is complete. | | `part_*.part`, `building.tmp` | Transient intermediates — ignore them. | In emit mode the per-table summary is worded **DIFF** (not FAIL) — differences are the expected product of a sync, not a validation failure. The emitter prints a final line: `target-sync emit complete: N CKT file(s) … + CHANGE.DONE`. ## Stage 2 — Apply (Change Loader) The apply stage runs the target's change loader. It drains `N.ckt` files from the emit directory and applies them to the target. ```bash theme={null} /loader/change_loader ``` Resolve the binary directory with the `mariadb → mysql` and `tigerdata → postgres` mapping. For example, a MariaDB target uses `mysql/loader/change_loader`. ### Configuration Path to a file containing the **target** DSN. The same `.skt` given to the emitter. The emitter's `emitOutputDirectory`. Where applied `N.ckt` files are moved after success. Scratch directory for the loader. Must be `false` — the emitter base64-encodes string and binary values. Maximum concurrent table applies. Recommended. Apply log path. Recommended. GCP credentials JSON. Required for Spanner and BigQuery targets. Staging bucket for the loader's `COPY`. Required for Firebolt and Databricks targets (they stage CSVs to S3). Pair with `awsRegion` and `awsCredentials` / `awsCredentialsFile`. ### Behavior * Polls `inputDirectory` for the next expected `N.ckt`, applies it to the target, and moves it to `doneDirectory`. * Relational targets apply via MERGE/upsert plus delete; cloud targets (Firebolt, Snowflake, Databricks, BigQuery) stage rows to their bucket and bulk-load. * Exits cleanly when it has applied through the last file and seen `CHANGE.DONE` (log line `CHANGE.DONE - exiting`). ## Change Record Format Each `N.ckt` file is UTF-8 text, tab-delimited, one change record per line. The format is byte-identical to the [CDC](/run/replication) change extractor's, so the change-loader applies it unchanged. ``` I … # INSERT: all columns, in schema order U

… # UPDATE: PK, then (column-offset, value) pairs for changed columns only D

# DELETE: primary-key columns only ``` * **`I` (insert)** — a row present in the source but missing from the target; all column values in schema order. * **`U` (update)** — a row whose PK matches but whose columns differ; the PK followed by sparse `(offset, value)` pairs, where `offset` is the 0-based column position. Only changed columns appear. * **`D` (delete)** — a row present in the target but absent from the source; primary-key columns only. Values are encoded on the wire as: **base64** for strings and binary/blob, **ISO-8601** for timestamps, and the literal `\N` for `NULL`. Each record ends with a trailing tab then newline (`\t\n`). ## End-to-End Example A complete `sqlserver → mysql` run, chaining schema migration → emit → apply. It assumes the Wirekite binaries are built and the two DSN files exist. ```bash theme={null} # ---- Setup ---- WK=/home/wirekite # Wirekite home (built binaries live here) WORK=/var/wirekite/sync_run_42 EMIT=$WORK/emit DONE=$WORK/done mkdir -p "$WORK" "$EMIT" "$DONE" SRC_DSN=/etc/wirekite/dsn/source_sqlserver.dsn TGT_DSN=/etc/wirekite/dsn/target_mysql.dsn # Tables to sync — SOURCE-side schema.table, one per line cat > "$WORK/tables.txt" < "$WORK/schema.cfg" < wirekite_schema.skt, and generate the DDL files. "$WK/orchestrator/wirekite" "$WORK/schema.cfg" # 1b. Apply the generated DDL to the target: drop, create, constraints, foreign keys. # (drop and FK steps are best-effort; -ignore-errors lets them pass.) "$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" -ignore-errors < "$WORK/d.sql" "$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" < "$WORK/c.sql" "$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" < "$WORK/i.sql" "$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" -ignore-errors < "$WORK/f.sql" # The target tables (and their _wkm merge shadow tables) now exist. # ============================================================ # PHASE 2 + 3 — Emit + Apply (run concurrently; reuse the SAME .skt + schemaRename) # ============================================================ cat > "$WORK/emit.cfg" < "$WORK/apply.cfg" < **Cloud targets** (Firebolt, Snowflake, Databricks, BigQuery, Spanner) need the extra staging/credential keys in `apply.cfg`, and `gcpCredfile` for the schema phase. See the [Stage 2](#stage-2--apply-change-loader) configuration. ## Resume & Crash Recovery Target Sync is restartable at every stage. * **Emit resume.** Set `checkpointDir` (typically the emit dir) and `resume=true`. The emitter records each completed window; a restarted run skips already-emitted windows and preserves published `*.ckt` files. * **Emit-once across crashes.** Within a window, the emitter writes a per-file high-water sidecar (`N.ckt.hw`, fsync'd before `N.ckt` is published) recording how far that window's PK-ordered emit reached. On resume it continues *past* the high-water, so every difference lands in **exactly one** committed `.ckt` — even if the host crashed mid-emit, and with no reliance on loader idempotency (required for non-MERGE loaders such as Spanner). * **Apply recovery.** The change-loader checkpoints applied files; on restart it resumes at the next un-applied `N.ckt`. Files already moved to `doneDirectory` are not re-applied. * **Gapless invariant.** A fresh emit run clears stale `*.ckt` / `CHANGE.DONE` first, so the loader never sees a leftover higher-numbered file or a premature `CHANGE.DONE`. Do not hand-place files in the emit directory. The loader applies `N.ckt` files in strict ascending order and treats `CHANGE.DONE` as the completion signal; a stray file breaks the gapless invariant. ## Performance & Tuning * **`windowSize`** — larger windows reduce per-window overhead but raise memory per concurrent comparison. * **`maxThreads`** — raises concurrency across table-windows; bound it by source and target connection limits. * **`emitFileTargetMB`** — larger `N.ckt` files reduce the loader's per-file COPY/upload overhead (most impactful for cloud targets). * **SQL Server source** — the validator reads SQL Server with a 32 KB TDS packet size, so wide tables transfer several times faster than the 4 KB default. This is automatic. * **Wide-table syncs** are dominated by raw data transfer; expect throughput in the tens of MB/s per stream over typical intra-cloud networking. * **Cassandra source** — keep `cassandraPKHashValidation=true` so the footprint stays bounded, and tune `cassandraPKHashMemRows` to trade RAM for spill I/O. The diff/emit cost is bounded by the **drift** (the rows that differ), not the table size; at large base size the source ring's read throughput is the ceiling. ## Limitations & Correctness Rules * **One-directional.** Reconciles the target to the source's state. To sync the other way, swap source and target in a separate run. * **Primary key required** on every table. * **Target schema must pre-exist.** Target Sync moves data, not DDL — run [schema migration](#schema-migration) first. * **Case-sensitivity must align.** Source and target must agree on PK case sensitivity. A case-insensitive source → case-sensitive target is fine; a case-sensitive source → case-insensitive target can lose case-distinct keys (the target folds them) and is unsupported. For mixed collations, ensure PK data is single-case or that the target PK columns use a binary collation. * **Type coverage is source-driven.** Only the supported source types can encode full-row INSERT/UPDATE values. ## Related Configure Apache Cassandra as a Target-Sync source. How CQL types map onto each relational and warehouse target. The diff engine behind the emit stage, also usable for standalone validation. Continuous change capture — the streaming alternative to a one-shot sync.