> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wirekite.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Target Sync

> Reconcile an already-populated target back to its source in a single pass — Wirekite diffs the two databases and applies only the differences, with no CDC.

## Overview

**Target Sync makes a target table set match a source table set in one shot, without CDC.** It compares each table between a source and a target database, row by row, and applies exactly the inserts, updates, and deletes needed to bring the **target** into agreement with the **source**.

Where [Data Loading](/run/dataloading) does a one-time bulk load into an empty target and [CDC Replication](/run/replication) streams ongoing changes continuously, Target Sync is the third mode: a **one-shot diff-and-apply** that reconciles a target that already holds data.

<Note>
  Target Sync is **one-directional**: the source is the source of truth, and the target is reconciled to it. To sync the other way, swap source and target in a separate run.
</Note>

Internally, Target Sync reuses the [TableValidator's](/run/validation) sort-merge diff walk. Instead of only *reporting* differences, it *emits* them as change records that a change-loader applies to the target — so the change records are byte-identical to the ones CDC produces, and the same target loaders apply them.

### When to use it

<CardGroup cols={3}>
  <Card title="Drift repair" icon="wrench">
    A target that has diverged from its source — a CDC gap, a manual edit, a load that stopped halfway — is re-converged exactly.
  </Card>

  <Card title="Backfill & top-up" icon="layer-group">
    Seed or top up a target that has no change feed, without standing up a continuous pipeline for a one-time job.
  </Card>

  <Card title="One-shot migration" icon="truck-fast">
    Move a database pair where CDC is unavailable or not worth configuring. A single reconciling pass lands the data.
  </Card>
</CardGroup>

### When *not* to use it

* **Continuous replication** — use [CDC Replication](/run/replication) (the streaming change extractor + change loader).
* **Bidirectional sync** — Target Sync is one-directional per run.

## How It Works

Target Sync runs as two cooperating stages that communicate through a shared **emit directory** on local disk.

```
            ┌───────────┐   read + diff   ┌──────────┐
   SOURCE ─▶│   EMIT    │◀────────────────│  TARGET  │
 (truth)    │ (Table-   │   read PKs      └──────────┘
            │ Validator)│                       ▲
            └─────┬─────┘                       │ apply I/U/D
                  │ writes N.ckt files          │
                  ▼                             │
           emit directory  ──────────────▶ ┌──────────┐
           (0.ckt, 1.ckt, …,               │  APPLY   │
            CHANGE.DONE)                    │ (change  │
                                            │  loader) │
                                            └──────────┘
```

| Stage        | Binary                          | Role                                                                                                                                       |
| ------------ | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| **1. Emit**  | `tablevalidator`                | Reads source + target, diffs each table in primary-key order, writes change records as `N.ckt` files, then drops a `CHANGE.DONE` sentinel. |
| **2. Apply** | `<target>/loader/change_loader` | Watches the emit directory, applies each `N.ckt` to the target, moves applied files aside, and exits when it sees `CHANGE.DONE`.           |

The two stages run **concurrently**: the loader begins applying early `N.ckt` files while the emitter is still producing later ones. They rendezvous on the `CHANGE.DONE` sentinel — when the loader logs `CHANGE.DONE - exiting`, the target matches the source.

<Note>
  Target Sync can be driven from the Wirekite **GUI** or **API**, which manage the run directory, config files, and both processes for you. The mechanics below describe what those interfaces orchestrate, and how to run it directly.
</Note>

## Supported Databases

| Role       | Supported types                                                                                                                               |
| ---------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| **Source** | `postgres`, `mysql`, `mariadb`, `sqlserver`, `oracle`, `cassandra`                                                                            |
| **Target** | `postgres`, `mysql`, `mariadb`, `singlestore`, `sqlserver`, `oracle`, `firebolt`, `snowflake`, `databricks`, `bigquery`, `spanner`, `mongodb` |

The source set is smaller because the source drives full INSERT/UPDATE row encoding (it needs complete column-type coverage). The target only supplies primary keys for DELETE detection, so more target types are supported. The emitter **fails loud** at startup on an unsupported source or target type.

<Note>
  **Cassandra is a Target-Sync-only source.** Cassandra has no CDC path in Wirekite — it participates as a source exclusively through Target Sync (snapshot + drift repair). See the [Cassandra Source Guide](/sourceguides/cassandra) and the [Cassandra Datatype Matrix](/datatypes/cassandra).
</Note>

<Note>
  **MariaDB and TigerData reuse other binaries.** MariaDB reuses the MySQL change-loader and TigerData reuses PostgreSQL's. When resolving the change-loader path, map `mariadb → mysql` and `tigerdata → postgres`. The `sourceType`/`targetType` names in the configs stay as the real database type.
</Note>

## Prerequisites

<Steps>
  <Step title="The target tables must already exist">
    Target Sync inserts, updates, and deletes rows; it does **not** create tables. Run [schema migration](#schema-migration) first if the target is empty.
  </Step>

  <Step title="A schema file (.skt) must be available">
    The `.skt` describes the tables (columns, types, primary keys). It is produced by the schema-migration step and consumed by **both** stages — they must be given the *same* `.skt`.
  </Step>

  <Step title="Every table must have a primary key">
    UPDATE and DELETE records are keyed by primary key. The emitter refuses a table with no PK rather than emit unkeyed records.
  </Step>

  <Step title="Source and target must agree on PK case sensitivity">
    The sort-merge diff relies on a consistent key ordering. A case-insensitive source syncing to a case-sensitive target (or vice-versa) can mis-window keys. See [Limitations](#limitations--correctness-rules).
  </Step>
</Steps>

### Schema Migration

Target Sync does **not** create tables or produce the `.skt` — a one-time **schema migration** does both. Run it whenever the target tables don't yet exist (or the source schema has changed). It has three logical steps, coordinated by the `wirekite` orchestrator:

1. **Extract** the source schema → `wirekite_schema.skt` via the source's `schema_extractor`.
2. **Generate** the target DDL from the `.skt` — `c.sql` (CREATE TABLE, plus the `_wkm` merge shadow tables), `i.sql` (constraints / primary keys), `f.sql` (foreign keys), `d.sql` (DROP TABLE) — via the target's `schema_loader`.
3. **Apply** the generated DDL to the target to actually create the tables.

The orchestrator's `schema-extract` mode performs steps 1–2 in one invocation. Step 3 (applying the SQL) is run separately — via the orchestrator's apply modes or by executing the generated `.sql` files with your own DB client.

<Warning>
  **Create the target schema with `createMergeTables=true`.** The change-loader applies change records through per-table `_wkm` merge shadow tables; those must exist before the apply stage runs. The schema loader creates them when this flag is set.
</Warning>

After schema migration completes, the target tables and the `.skt` are in place. Proceed to the emit and apply stages, passing the **same** `wirekite_schema.skt` and the **same** `schemaRename` you used during migration.

## Stage 1 — Emit (TableValidator)

The emit stage runs the TableValidator in emit mode. It reads the source and target, diffs each table, and writes change records to the emit directory.

```bash theme={null}
tablevalidator <emit-config-file>
```

### Configuration

<ResponseField name="emitMode" type="boolean" required>
  Must be `true` to enable Target Sync emit. Otherwise the binary runs in plain [validation](/run/validation) mode.
</ResponseField>

<ResponseField name="emitOutputDirectory" type="string" required>
  Directory where `N.ckt` files and the `CHANGE.DONE` sentinel are written. Created if absent. The apply stage reads from here.
</ResponseField>

<ResponseField name="sourceType" type="string" required>
  One of the supported source types (see [Supported Databases](#supported-databases)).
</ResponseField>

<ResponseField name="sourceDsnFile" type="string" required>
  Path to a file containing the source DSN.
</ResponseField>

<ResponseField name="targetType" type="string" required>
  One of the supported target types.
</ResponseField>

<ResponseField name="targetDsnFile" type="string" required>
  Path to a file containing the target DSN.
</ResponseField>

<ResponseField name="schemaFile" type="string" required>
  Path to the `.skt` schema file. Must be the same file given to the apply stage.
</ResponseField>

<ResponseField name="tablesFile" type="string" required>
  Path to a file listing tables to sync, one `schema.table` per line (source-side names).
</ResponseField>

<ResponseField name="schemaRename" type="string">
  `sourceSchema:targetSchema` — rewrites the schema name in emitted records (e.g. `dbo:app_schema`). Required when the source and target schema names differ, and must match the value used during schema migration.
</ResponseField>

<ResponseField name="windowSize" type="integer">
  Rows per diff window (e.g. `10000`). Controls memory and parallel granularity. Recommended.
</ResponseField>

<ResponseField name="maxThreads" type="integer">
  Maximum concurrent table-window comparisons. Bound it by the source and target connection limits. Recommended.
</ResponseField>

<ResponseField name="emitFileTargetMB" type="integer" default="32">
  Target size of each consolidated `N.ckt` file. Larger files amortize the loader's per-file cost (most impactful for cloud targets).
</ResponseField>

<ResponseField name="logFile" type="string">
  Emit log path. Recommended.
</ResponseField>

<ResponseField name="checkpointDir" type="string">
  Enables emit resume — records each emitted window so a restarted emitter skips finished work. See [Resume & Crash Recovery](#resume--crash-recovery).
</ResponseField>

<ResponseField name="resume" type="boolean" default="false">
  When `true`, resumes from `checkpointDir` instead of clearing prior output.
</ResponseField>

<ResponseField name="gcpCredfile" type="string">
  GCP credentials JSON. Required for Spanner and BigQuery targets (the emitter reads target DELETE primary keys from them).
</ResponseField>

#### Cassandra source options

<ResponseField name="cassandraPKHashValidation" type="boolean">
  When `true`, uses the bounded `(pk, hash)` diff — projecting each row to a primary key plus a row hash rather than materializing whole tables. Recommended for Cassandra sources. Applies to single-PK tables; composite PKs fall back to a whole-table sort automatically.
</ResponseField>

<ResponseField name="cassandraPKHashMemRows" type="integer">
  In-memory ceiling for the `(pk, hash)` projection before it spills to a k-way merge. Trade RAM for spill I/O.
</ResponseField>

<ResponseField name="cassandraSortMemRows" type="integer">
  In-memory ceiling for the whole-row sort (the composite-PK fallback and initial data migration).
</ResponseField>

### Output

| File                          | Meaning                                                                                              |
| ----------------------------- | ---------------------------------------------------------------------------------------------------- |
| `0.ckt`, `1.ckt`, …           | Gapless, ascending change-record files.                                                              |
| `CHANGE.DONE`                 | Zero-byte sentinel written **after** all `N.ckt` are finalized. Its presence means emit is complete. |
| `part_*.part`, `building.tmp` | Transient intermediates — ignore them.                                                               |

In emit mode the per-table summary is worded **DIFF** (not FAIL) — differences are the expected product of a sync, not a validation failure. The emitter prints a final line: `target-sync emit complete: N CKT file(s) … + CHANGE.DONE`.

## Stage 2 — Apply (Change Loader)

The apply stage runs the target's change loader. It drains `N.ckt` files from the emit directory and applies them to the target.

```bash theme={null}
<target>/loader/change_loader <apply-config-file>
```

<Note>
  Resolve the binary directory with the `mariadb → mysql` and `tigerdata → postgres` mapping. For example, a MariaDB target uses `mysql/loader/change_loader`.
</Note>

### Configuration

<ResponseField name="dsnFile" type="string" required>
  Path to a file containing the **target** DSN.
</ResponseField>

<ResponseField name="schemaFile" type="string" required>
  The same `.skt` given to the emitter.
</ResponseField>

<ResponseField name="inputDirectory" type="string" required>
  The emitter's `emitOutputDirectory`.
</ResponseField>

<ResponseField name="doneDirectory" type="string" required>
  Where applied `N.ckt` files are moved after success.
</ResponseField>

<ResponseField name="workDirectory" type="string" required>
  Scratch directory for the loader.
</ResponseField>

<ResponseField name="hexEncoding" type="boolean" required>
  Must be `false` — the emitter base64-encodes string and binary values.
</ResponseField>

<ResponseField name="maxThreads" type="integer">
  Maximum concurrent table applies. Recommended.
</ResponseField>

<ResponseField name="logFile" type="string">
  Apply log path. Recommended.
</ResponseField>

<ResponseField name="gcpCredfile" type="string">
  GCP credentials JSON. Required for Spanner and BigQuery targets.
</ResponseField>

<ResponseField name="awsBucket" type="string">
  Staging bucket for the loader's `COPY`. Required for Firebolt and Databricks targets (they stage CSVs to S3). Pair with `awsRegion` and `awsCredentials` / `awsCredentialsFile`.
</ResponseField>

### Behavior

* Polls `inputDirectory` for the next expected `N.ckt`, applies it to the target, and moves it to `doneDirectory`.
* Relational targets apply via MERGE/upsert plus delete; cloud targets (Firebolt, Snowflake, Databricks, BigQuery) stage rows to their bucket and bulk-load.
* Exits cleanly when it has applied through the last file and seen `CHANGE.DONE` (log line `CHANGE.DONE - exiting`).

## Change Record Format

Each `N.ckt` file is UTF-8 text, tab-delimited, one change record per line. The format is byte-identical to the [CDC](/run/replication) change extractor's, so the change-loader applies it unchanged.

```
I  <schema>  <table>  <col1> <col2> … <colN>           # INSERT: all columns, in schema order
U  <schema>  <table>  <pk…>  <off> <val> <off> <val>…  # UPDATE: PK, then (column-offset, value) pairs for changed columns only
D  <schema>  <table>  <pk…>                            # DELETE: primary-key columns only
```

* **`I` (insert)** — a row present in the source but missing from the target; all column values in schema order.
* **`U` (update)** — a row whose PK matches but whose columns differ; the PK followed by sparse `(offset, value)` pairs, where `offset` is the 0-based column position. Only changed columns appear.
* **`D` (delete)** — a row present in the target but absent from the source; primary-key columns only.

Values are encoded on the wire as: **base64** for strings and binary/blob, **ISO-8601** for timestamps, and the literal `\N` for `NULL`. Each record ends with a trailing tab then newline (`\t\n`).

## End-to-End Example

A complete `sqlserver → mysql` run, chaining schema migration → emit → apply. It assumes the Wirekite binaries are built and the two DSN files exist.

```bash theme={null}
# ---- Setup ----
WK=/home/wirekite                       # Wirekite home (built binaries live here)
WORK=/var/wirekite/sync_run_42
EMIT=$WORK/emit
DONE=$WORK/done
mkdir -p "$WORK" "$EMIT" "$DONE"

SRC_DSN=/etc/wirekite/dsn/source_sqlserver.dsn
TGT_DSN=/etc/wirekite/dsn/target_mysql.dsn

# Tables to sync — SOURCE-side schema.table, one per line
cat > "$WORK/tables.txt" <<EOF
dbo.test_strings
dbo.test_alltypes
dbo.bulk_001
EOF

# ============================================================
# PHASE 1 — Schema migration: create target tables + produce .skt
# ============================================================
cat > "$WORK/schema.cfg" <<EOF
mode=schema-extract
home=$WK
source=sqlserver
target=mysql

source.schema.dsnFile=$SRC_DSN
source.schema.tablesFile=$WORK/tables.txt
source.schema.outputDirectory=$WORK
source.schema.logFile=$WORK/schema_extractor.log
source.schema.schemaRename=dbo:app_schema

target.schema.schemaFile=$WORK/wirekite_schema.skt
target.schema.logFile=$WORK/schema_loader.log
target.schema.createTableFile=$WORK/c.sql
target.schema.createConstraintFile=$WORK/i.sql
target.schema.createForeignKeyFile=$WORK/f.sql
target.schema.dropTableFile=$WORK/d.sql
target.schema.createMergeTables=true
EOF

# 1a. Extract source schema -> wirekite_schema.skt, and generate the DDL files.
"$WK/orchestrator/wirekite" "$WORK/schema.cfg"

# 1b. Apply the generated DDL to the target: drop, create, constraints, foreign keys.
#     (drop and FK steps are best-effort; -ignore-errors lets them pass.)
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" -ignore-errors < "$WORK/d.sql"
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN"                < "$WORK/c.sql"
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN"                < "$WORK/i.sql"
"$WK/wiretalk/wiretalk" -d mysql -c "$TGT_DSN" -ignore-errors < "$WORK/f.sql"
# The target tables (and their _wkm merge shadow tables) now exist.

# ============================================================
# PHASE 2 + 3 — Emit + Apply (run concurrently; reuse the SAME .skt + schemaRename)
# ============================================================
cat > "$WORK/emit.cfg" <<EOF
emitMode=true
emitOutputDirectory=$EMIT
sourceType=sqlserver
sourceDsnFile=$SRC_DSN
targetType=mysql
targetDsnFile=$TGT_DSN
schemaFile=$WORK/wirekite_schema.skt
tablesFile=$WORK/tables.txt
schemaRename=dbo:app_schema
windowSize=10000
maxThreads=8
emitFileTargetMB=32
logFile=$WORK/emit.log
EOF

cat > "$WORK/apply.cfg" <<EOF
dsnFile=$TGT_DSN
schemaFile=$WORK/wirekite_schema.skt
inputDirectory=$EMIT
doneDirectory=$DONE
workDirectory=$WORK
logFile=$WORK/apply.log
maxThreads=8
hexEncoding=false
EOF

# Loader drains N.ckt files as the emitter writes them (mysql == mapped binary dir).
"$WK/mysql/loader/change_loader" "$WORK/apply.cfg" &
"$WK/tablevalidator/tablevalidator" "$WORK/emit.cfg"
wait
# Done: when the loader logs "CHANGE.DONE - exiting", the target matches the source.
```

<Note>
  **Cloud targets** (Firebolt, Snowflake, Databricks, BigQuery, Spanner) need the extra staging/credential keys in `apply.cfg`, and `gcpCredfile` for the schema phase. See the [Stage 2](#stage-2--apply-change-loader) configuration.
</Note>

## Resume & Crash Recovery

Target Sync is restartable at every stage.

* **Emit resume.** Set `checkpointDir` (typically the emit dir) and `resume=true`. The emitter records each completed window; a restarted run skips already-emitted windows and preserves published `*.ckt` files.
* **Emit-once across crashes.** Within a window, the emitter writes a per-file high-water sidecar (`N.ckt.hw`, fsync'd before `N.ckt` is published) recording how far that window's PK-ordered emit reached. On resume it continues *past* the high-water, so every difference lands in **exactly one** committed `.ckt` — even if the host crashed mid-emit, and with no reliance on loader idempotency (required for non-MERGE loaders such as Spanner).
* **Apply recovery.** The change-loader checkpoints applied files; on restart it resumes at the next un-applied `N.ckt`. Files already moved to `doneDirectory` are not re-applied.
* **Gapless invariant.** A fresh emit run clears stale `*.ckt` / `CHANGE.DONE` first, so the loader never sees a leftover higher-numbered file or a premature `CHANGE.DONE`.

<Warning>
  Do not hand-place files in the emit directory. The loader applies `N.ckt` files in strict ascending order and treats `CHANGE.DONE` as the completion signal; a stray file breaks the gapless invariant.
</Warning>

## Performance & Tuning

* **`windowSize`** — larger windows reduce per-window overhead but raise memory per concurrent comparison.
* **`maxThreads`** — raises concurrency across table-windows; bound it by source and target connection limits.
* **`emitFileTargetMB`** — larger `N.ckt` files reduce the loader's per-file COPY/upload overhead (most impactful for cloud targets).
* **SQL Server source** — the validator reads SQL Server with a 32 KB TDS packet size, so wide tables transfer several times faster than the 4 KB default. This is automatic.
* **Wide-table syncs** are dominated by raw data transfer; expect throughput in the tens of MB/s per stream over typical intra-cloud networking.
* **Cassandra source** — keep `cassandraPKHashValidation=true` so the footprint stays bounded, and tune `cassandraPKHashMemRows` to trade RAM for spill I/O. The diff/emit cost is bounded by the **drift** (the rows that differ), not the table size; at large base size the source ring's read throughput is the ceiling.

## Limitations & Correctness Rules

* **One-directional.** Reconciles the target to the source's state. To sync the other way, swap source and target in a separate run.
* **Primary key required** on every table.
* **Target schema must pre-exist.** Target Sync moves data, not DDL — run [schema migration](#schema-migration) first.
* **Case-sensitivity must align.** Source and target must agree on PK case sensitivity. A case-insensitive source → case-sensitive target is fine; a case-sensitive source → case-insensitive target can lose case-distinct keys (the target folds them) and is unsupported. For mixed collations, ensure PK data is single-case or that the target PK columns use a binary collation.
* **Type coverage is source-driven.** Only the supported source types can encode full-row INSERT/UPDATE values.

## Related

<CardGroup cols={2}>
  <Card title="Cassandra Source Guide" icon="database" href="/sourceguides/cassandra">
    Configure Apache Cassandra as a Target-Sync source.
  </Card>

  <Card title="Cassandra Datatype Matrix" icon="table-cells" href="/datatypes/cassandra">
    How CQL types map onto each relational and warehouse target.
  </Card>

  <Card title="TableValidator" icon="circle-check" href="/run/validation">
    The diff engine behind the emit stage, also usable for standalone validation.
  </Card>

  <Card title="CDC Replication" icon="arrows-rotate" href="/run/replication">
    Continuous change capture — the streaming alternative to a one-shot sync.
  </Card>
</CardGroup>
