Skip to main content

Overview

Wirekite supports PostgreSQL 10 and above (including YugaByte, TigerData, and Google AlloyDB) as a source database for:
  • Schema Extraction - Extract table definitions to Wirekite’s intermediate format
  • Data Extraction - Bulk extract table data for initial load
  • Change Extraction (CDC) - Capture ongoing inserts, updates, and deletes via logical replication
All references to PostgreSQL in this guide also apply to YugaByte, TigerData, and AlloyDB setups. These databases use the PostgreSQL wire protocol, so the configuration and parameters are identical.
PostgreSQL Change Extraction uses the pgoutput logical replication plugin, which is built into PostgreSQL 10 and later. This provides efficient, low-overhead change capture without requiring additional extensions.

Prerequisites

Before configuring PostgreSQL as a Wirekite source, ensure the following requirements are met:

Database Configuration

  1. Version: PostgreSQL 10 or later (PostgreSQL 15+ recommended for non-superuser CDC)
  2. Logical Replication (CDC only): WAL level must be set to logical in postgresql.conf:
    wal_level = logical
    max_replication_slots = 4    # at least 1 available slot
    max_wal_senders = 4          # at least 1 available sender
    
    Restart PostgreSQL after changing these settings.

File System Access

  • For server-side extraction: Ensure the postgres user has write permissions to the output directory
  • Verify sufficient disk space for extracted data files
For cloud-managed PostgreSQL instances (RDS, Cloud SQL, etc.) or remote databases where server-side file access isn’t available, use databaseRemote=true (the default) to extract data via the client connection.

General Recommendations

When performing initial data extraction, stop replication and quiesce the database instance. This ensures the entire dump is consistent to a specific position. It is highly recommended to run Wirekite against a replica rather than production to avoid impacting workloads.

Source Database Setup

Wirekite requires a dedicated user with specific privileges and a wirekite schema for internal tracking tables. Follow these steps to configure your PostgreSQL source.

Step 1: Create the Wirekite User

Connect as a superuser (e.g., postgres) and run:
CREATE ROLE wirekite WITH LOGIN PASSWORD '<your_password>';

Step 2: Grant Connection Access

GRANT CONNECT ON DATABASE <your_database> TO wirekite;

Step 3: Create the Wirekite Schema

Wirekite uses a dedicated wirekite schema for its internal tracking tables (wirekite_progress and wirekite_action). These tables track extraction progress and enable crash recovery.
-- Connect to the target database first
\c <your_database>

GRANT CREATE ON DATABASE <your_database> TO wirekite;
Wirekite will automatically create the wirekite schema and its tables on first run. Alternatively, you can create the schema in advance:
CREATE SCHEMA IF NOT EXISTS wirekite AUTHORIZATION wirekite;

Step 4: Grant Read Access to Application Tables

For each application schema that contains tables you want to extract:
GRANT USAGE ON SCHEMA <app_schema> TO wirekite;
GRANT SELECT ON ALL TABLES IN SCHEMA <app_schema> TO wirekite;

-- For tables created in the future:
ALTER DEFAULT PRIVILEGES IN SCHEMA <app_schema>
    GRANT SELECT ON TABLES TO wirekite;
For example, if your application tables are in the public schema:
GRANT USAGE ON SCHEMA public TO wirekite;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO wirekite;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO wirekite;

Step 5: System Catalog Access

PostgreSQL grants SELECT on information_schema and pg_catalog views to all users by default. No additional grants are needed for:
  • information_schema.tables
  • information_schema.columns
  • pg_catalog.pg_constraint
  • pg_catalog.pg_class
  • pg_catalog.pg_indexes
  • pg_catalog.pg_publication
  • pg_catalog.pg_replication_slots

Step 6 (Optional): Server-Side File Export

If you plan to use server-side file export (the “local” mode where the database writes files directly), the wirekite user needs the pg_write_server_files role:
GRANT pg_write_server_files TO wirekite;
This is NOT needed if Wirekite and the database are on different machines (the default “remote” streaming mode with databaseRemote=true).

Step 7 (CDC Only): Grant Replication Privileges

For change data capture (CDC), Wirekite uses PostgreSQL’s native logical replication with the built-in pgoutput plugin. The wirekite user needs the REPLICATION attribute:
ALTER ROLE wirekite WITH REPLICATION;

Publication Ownership (PostgreSQL 15+)

Wirekite creates a publication for each replication slot. On PostgreSQL 15+, the wirekite user must be the owner of the tables being published, OR you can use a superuser for publication creation. If the application tables are owned by another user, you can either: Option A: Grant table ownership to wirekite (not recommended for production):
ALTER TABLE <app_schema>.<table> OWNER TO wirekite;
Option B: Create the publication as superuser and let wirekite use it:
-- As superuser, create the publication for the tables wirekite needs to track:
CREATE PUBLICATION wirekite_pub FOR TABLE
    <app_schema>.table1,
    <app_schema>.table2,
    <app_schema>.table3;
Option C (PostgreSQL 10-14): CREATE PUBLICATION requires superuser. Either run wirekite as superuser for the initial setup, or pre-create the publication as shown in Option B.

Create the Replication Slot

Wirekite can create its own replication slot, but you can also pre-create it:
SELECT pg_create_logical_replication_slot('wirekite_slot', 'pgoutput');
When running via the Orchestrator in data+change mode, the replication slot is created automatically before data extraction begins. In change-only mode, create the replication slot manually before starting the Change Extractor.

Complete Setup Script

Data Extraction Only

-- Run as superuser (e.g., postgres)

-- 1. Create user
CREATE ROLE wirekite WITH LOGIN PASSWORD '<your_password>';

-- 2. Grant connection
GRANT CONNECT ON DATABASE <your_database> TO wirekite;

-- 3. Allow schema creation (for wirekite internal tables)
GRANT CREATE ON DATABASE <your_database> TO wirekite;

-- 4. Grant read access to application tables
GRANT USAGE ON SCHEMA <app_schema> TO wirekite;
GRANT SELECT ON ALL TABLES IN SCHEMA <app_schema> TO wirekite;
ALTER DEFAULT PRIVILEGES IN SCHEMA <app_schema>
    GRANT SELECT ON TABLES TO wirekite;

Data Extraction + CDC (Change Data Capture)

-- Run as superuser (e.g., postgres)

-- 1. Create user with replication
CREATE ROLE wirekite WITH LOGIN REPLICATION PASSWORD '<your_password>';

-- 2. Grant connection
GRANT CONNECT ON DATABASE <your_database> TO wirekite;

-- 3. Allow schema creation
GRANT CREATE ON DATABASE <your_database> TO wirekite;

-- 4. Grant read access to application tables
GRANT USAGE ON SCHEMA <app_schema> TO wirekite;
GRANT SELECT ON ALL TABLES IN SCHEMA <app_schema> TO wirekite;
ALTER DEFAULT PRIVILEGES IN SCHEMA <app_schema>
    GRANT SELECT ON TABLES TO wirekite;

-- 5. Create publication for CDC (as superuser)
--    List all tables that need change tracking:
CREATE PUBLICATION wirekite_pub FOR TABLE
    <app_schema>.table1,
    <app_schema>.table2;
    -- add more tables as needed

-- 6. Verify wal_level
SHOW wal_level;  -- must be 'logical'

pg_hba.conf

Ensure the wirekite user can connect from the Wirekite server. Add to pg_hba.conf:
# TYPE  DATABASE        USER       ADDRESS             METHOD
host    <your_database> wirekite   <wirekite_ip>/32    scram-sha-256
For CDC replication connections, also add:
host    replication     wirekite   <wirekite_ip>/32    scram-sha-256
Reload PostgreSQL after changes: SELECT pg_reload_conf();

What Wirekite Creates

ObjectTypeSchemaPurpose
wirekiteScheman/aHolds internal tracking tables
wirekite_progressTablewirekiteTracks extraction progress, crash recovery
wirekite_actionTablewirekiteRun/pause/stop control
PublicationPublicationn/aRoutes WAL changes to wirekite (CDC only)
Replication slotReplication Slotn/aReserves WAL for wirekite (CDC only)
Wirekite does NOT create any triggers, functions, or stored procedures on the source database.

Privilege Summary

PrivilegeRequired For
CONNECT ON DATABASEBasic connectivity
CREATE ON DATABASECreating the wirekite schema
USAGE ON SCHEMA <app_schema>Accessing application tables
SELECT ON ALL TABLES IN SCHEMA <app_schema>Reading table data
REPLICATION role attributeCDC only: logical replication protocol
pg_write_server_files roleOptional: server-side file export only

Schema Extractor

The Schema Extractor reads table definitions from PostgreSQL and outputs them to Wirekite’s intermediate schema format (.skt file). This schema file is used by target loaders to create corresponding tables in the destination database.
The Schema Extractor can run standalone or as part of the Orchestrator workflow. The configuration parameters are the same in both cases.

Required Parameters

dsnFile
string
required
Path to a file containing the PostgreSQL connection string. The file should contain exactly one line with the DSN.
Connection string format:
postgres://username:password@host:port/database
Example DSN file contents:
postgres://wirekite:secretpass@postgres-replica.example.com:5432/myapp
tablesFile
string
required
Path to a file listing the tables to extract, one per line in schema.table format.
Example tablesFile contents:
public.users
public.orders
public.products
inventory.items
outputDirectory
string
required
Absolute path to the directory where Wirekite will write the schema file (wirekite_schema.skt). The directory must exist and be writable.
logFile
string
required
Absolute path to the log file where the Schema Extractor will write operational logs.

Optional Parameters

schemaRename
string
Rename the source schema in the output. Format: sourceSchema:targetSchema. Useful when the target database uses a different schema name.

Data Extractor

The Data Extractor performs bulk extraction of table data, writing records to Wirekite’s intermediate data format (.dkt files). Large tables are automatically split across multiple files based on the maxPages setting.

Required Parameters

dsnFile
string
required
Path to a file containing the PostgreSQL connection string.
Connection string format:
postgres://username:password@host:port/database
tablesFile
string
required
Path to a file listing the tables to extract, one per line in schema.table format.
outputDirectory
string
required
Absolute path to the directory where Wirekite will write data files. Files are named schema.table.N.dkt where N is a sequence number.
logFile
string
required
Absolute path to the log file for Data Extractor operations.

Optional Parameters

maxThreads
integer
default:"5"
Number of parallel extraction threads. Each thread consumes one database connection and approximately one CPU core. Set this based on available CPU cores and acceptable database load.
maxPages
integer
Maximum number of PostgreSQL storage pages to visit by each extraction thread. Since pages have varying numbers of rows, this corresponds loosely to the number of rows extracted at a time. Adjust based on available memory.
hexEncoding
boolean
default:"false"
When true, binary and string data is encoded as hexadecimal instead of base64. Hex encoding produces larger files but may be required for certain target databases (e.g., Snowflake).
sortFiles
boolean
default:"true"
When true, sorts output files by table name for easier management and verification.
databaseRemote
boolean
default:"true"
When true, uses client-side data extraction instead of server-side file writes. Required for cloud-managed PostgreSQL instances (Amazon RDS, Google Cloud SQL, Azure Database for PostgreSQL) or any remote database where server-side file access is not available. Set to false for local extraction using server-side file operations.
schemaRename
string
Rename the source schema in the output. Format: sourceSchema:targetSchema.
For best performance on dedicated extraction hosts with local database access, set maxThreads equal to the number of CPU cores available (e.g., 32 threads for a 32-core host).

Change Extractor

The Change Extractor captures ongoing data changes (INSERT, UPDATE, DELETE) by consuming PostgreSQL logical replication stream. It outputs change events to Wirekite’s change format (.ckt files) for loading into target databases.
The Change Extractor uses the pgoutput logical replication plugin and requires a replication slot to be created before starting extraction.

Required Parameters

dsnFile
string
required
Path to a file containing the PostgreSQL connection string.
Connection string format:
postgres://username:password@host:port/database
tablesFile
string
required
Path to a file listing the tables to track for changes, one per line in schema.table format.
outputDirectory
string
required
Absolute path to the directory where Wirekite will write change files. Files are named with sequential numbers (e.g., 0.ckt, 1.ckt, etc.).
logFile
string
required
Absolute path to the log file for Change Extractor operations.
replicationSlot
string
required
The name of the PostgreSQL replication slot to use for change capture. This slot must be created before starting the Change Extractor:
SELECT pg_create_logical_replication_slot('slot_name', 'pgoutput');
The replication slot must exist before starting the Change Extractor. If the slot doesn’t exist, the extractor will fail to start.

Optional Parameters

processCommits
boolean
default:"false"
When true, includes BEGIN and COMMIT markers in the output. Wirekite guarantees that transactions never span multiple output files regardless of this setting.
exitWhenIdle
boolean
default:"false"
When true, the extractor exits after processing all pending changes. When false (default), it runs continuously, waiting for new changes indefinitely.
schemaRename
string
Rename the source schema in the output. Format: sourceSchema:targetSchema.

Handover Operations

The Handover utility provides administrative operations for managing replication slots. It can be run independently or as part of orchestrated workflows.

Create Replication Slot

handover -operation createslot -dsnfile /path/to/dsn -slotname wirekite_slot

Drop Replication Slot

handover -operation dropslot -dsnfile /path/to/dsn -slotname wirekite_slot
Dropping a replication slot will discard all unconsumed changes. Only drop slots when you are certain change capture is no longer needed or when recreating the slot from a new position.

Orchestrator Configuration

When using the Wirekite Orchestrator, prefix source parameters with source.schema., source.data., or source.change. depending on the operation mode. Example orchestrator configuration for PostgreSQL source:
# Main configuration
source=postgres
target=snowflake

# Schema extraction
source.schema.dsnFile=/opt/wirekite/config/postgres.dsn
source.schema.tablesFile=/opt/wirekite/config/tables.txt
source.schema.outputDirectory=/opt/wirekite/output/schema
source.schema.logFile=/var/log/wirekite/schema-extractor.log

# Data extraction
source.data.dsnFile=/opt/wirekite/config/postgres.dsn
source.data.tablesFile=/opt/wirekite/config/tables.txt
source.data.outputDirectory=/opt/wirekite/output/data
source.data.logFile=/var/log/wirekite/data-extractor.log
source.data.maxThreads=8
source.data.databaseRemote=true

# Change extraction (CDC)
source.change.dsnFile=/opt/wirekite/config/postgres.dsn
source.change.tablesFile=/opt/wirekite/config/tables.txt
source.change.outputDirectory=/opt/wirekite/output/changes
source.change.logFile=/var/log/wirekite/change-extractor.log
source.change.replicationSlot=wirekite_slot
source.change.exitWhenIdle=false
When running in data+change mode, the Orchestrator automatically creates the replication slot before data extraction and manages change capture positioning.
For complete Orchestrator documentation, see the Execution Guide.