Overview
Wirekite supports PostgreSQL 10 and above (including YugaByte, TigerData, and Google AlloyDB) as a source database for:- Schema Extraction - Extract table definitions to Wirekite’s intermediate format
- Data Extraction - Bulk extract table data for initial load
- Change Extraction (CDC) - Capture ongoing inserts, updates, and deletes via logical replication
All references to PostgreSQL in this guide also apply to YugaByte, TigerData, and AlloyDB setups. These databases use the PostgreSQL wire protocol, so the configuration and parameters are identical.
PostgreSQL Change Extraction uses the
pgoutput logical replication plugin, which is built into PostgreSQL 10 and later. This provides efficient, low-overhead change capture without requiring additional extensions.Prerequisites
Before configuring PostgreSQL as a Wirekite source, ensure the following requirements are met:Database Configuration
- Version: PostgreSQL 10 or later (PostgreSQL 15+ recommended for non-superuser CDC)
- Logical Replication (CDC only): WAL level must be set to
logicalinpostgresql.conf:Restart PostgreSQL after changing these settings.
File System Access
- For server-side extraction: Ensure the
postgresuser has write permissions to the output directory - Verify sufficient disk space for extracted data files
General Recommendations
Source Database Setup
Wirekite requires a dedicated user with specific privileges and awirekite schema for internal tracking tables. Follow these steps to configure your PostgreSQL source.
Step 1: Create the Wirekite User
Connect as a superuser (e.g.,postgres) and run:
Step 2: Grant Connection Access
Step 3: Create the Wirekite Schema
Wirekite uses a dedicatedwirekite schema for its internal tracking tables (wirekite_progress and wirekite_action). These tables track extraction progress and enable crash recovery.
wirekite schema and its tables on first run. Alternatively, you can create the schema in advance:
Step 4: Grant Read Access to Application Tables
For each application schema that contains tables you want to extract:public schema:
Step 5: System Catalog Access
PostgreSQL grantsSELECT on information_schema and pg_catalog views to all users by default. No additional grants are needed for:
information_schema.tablesinformation_schema.columnspg_catalog.pg_constraintpg_catalog.pg_classpg_catalog.pg_indexespg_catalog.pg_publicationpg_catalog.pg_replication_slots
Step 6 (Optional): Server-Side File Export
If you plan to use server-side file export (the “local” mode where the database writes files directly), the wirekite user needs thepg_write_server_files role:
This is NOT needed if Wirekite and the database are on different machines (the default “remote” streaming mode with
databaseRemote=true).Step 7 (CDC Only): Grant Replication Privileges
For change data capture (CDC), Wirekite uses PostgreSQL’s native logical replication with the built-inpgoutput plugin. The wirekite user needs the REPLICATION attribute:
Publication Ownership (PostgreSQL 15+)
Wirekite creates a publication for each replication slot. On PostgreSQL 15+, the wirekite user must be the owner of the tables being published, OR you can use a superuser for publication creation. If the application tables are owned by another user, you can either: Option A: Grant table ownership to wirekite (not recommended for production):CREATE PUBLICATION requires superuser. Either run wirekite as superuser for the initial setup, or pre-create the publication as shown in Option B.
Create the Replication Slot
Wirekite can create its own replication slot, but you can also pre-create it:Complete Setup Script
Data Extraction Only
Data Extraction + CDC (Change Data Capture)
pg_hba.conf
Ensure the wirekite user can connect from the Wirekite server. Add topg_hba.conf:
SELECT pg_reload_conf();
What Wirekite Creates
| Object | Type | Schema | Purpose |
|---|---|---|---|
wirekite | Schema | n/a | Holds internal tracking tables |
wirekite_progress | Table | wirekite | Tracks extraction progress, crash recovery |
wirekite_action | Table | wirekite | Run/pause/stop control |
| Publication | Publication | n/a | Routes WAL changes to wirekite (CDC only) |
| Replication slot | Replication Slot | n/a | Reserves WAL for wirekite (CDC only) |
Wirekite does NOT create any triggers, functions, or stored procedures on the source database.
Privilege Summary
| Privilege | Required For |
|---|---|
CONNECT ON DATABASE | Basic connectivity |
CREATE ON DATABASE | Creating the wirekite schema |
USAGE ON SCHEMA <app_schema> | Accessing application tables |
SELECT ON ALL TABLES IN SCHEMA <app_schema> | Reading table data |
REPLICATION role attribute | CDC only: logical replication protocol |
pg_write_server_files role | Optional: server-side file export only |
Schema Extractor
The Schema Extractor reads table definitions from PostgreSQL and outputs them to Wirekite’s intermediate schema format (.skt file). This schema file is used by target loaders to create corresponding tables in the destination database.
The Schema Extractor can run standalone or as part of the Orchestrator workflow. The configuration parameters are the same in both cases.
Required Parameters
Path to a file containing the PostgreSQL connection string. The file should contain exactly one line with the DSN.
Path to a file listing the tables to extract, one per line in
schema.table format.Absolute path to the directory where Wirekite will write the schema file (
wirekite_schema.skt). The directory must exist and be writable.Absolute path to the log file where the Schema Extractor will write operational logs.
Optional Parameters
Rename the source schema in the output. Format:
sourceSchema:targetSchema. Useful when the target database uses a different schema name.Data Extractor
The Data Extractor performs bulk extraction of table data, writing records to Wirekite’s intermediate data format (.dkt files). Large tables are automatically split across multiple files based on the maxPages setting.
Required Parameters
Path to a file containing the PostgreSQL connection string.
Path to a file listing the tables to extract, one per line in
schema.table format.Absolute path to the directory where Wirekite will write data files. Files are named
schema.table.N.dkt where N is a sequence number.Absolute path to the log file for Data Extractor operations.
Optional Parameters
Number of parallel extraction threads. Each thread consumes one database connection and approximately one CPU core. Set this based on available CPU cores and acceptable database load.
Maximum number of PostgreSQL storage pages to visit by each extraction thread. Since pages have varying numbers of rows, this corresponds loosely to the number of rows extracted at a time. Adjust based on available memory.
When
true, binary and string data is encoded as hexadecimal instead of base64. Hex encoding produces larger files but may be required for certain target databases (e.g., Snowflake).When
true, sorts output files by table name for easier management and verification.When
true, uses client-side data extraction instead of server-side file writes. Required for cloud-managed PostgreSQL instances (Amazon RDS, Google Cloud SQL, Azure Database for PostgreSQL) or any remote database where server-side file access is not available. Set to false for local extraction using server-side file operations.Rename the source schema in the output. Format:
sourceSchema:targetSchema.Change Extractor
The Change Extractor captures ongoing data changes (INSERT, UPDATE, DELETE) by consuming PostgreSQL logical replication stream. It outputs change events to Wirekite’s change format (.ckt files) for loading into target databases.
The Change Extractor uses the
pgoutput logical replication plugin and requires a replication slot to be created before starting extraction.Required Parameters
Path to a file containing the PostgreSQL connection string.
Path to a file listing the tables to track for changes, one per line in
schema.table format.Absolute path to the directory where Wirekite will write change files. Files are named with sequential numbers (e.g.,
0.ckt, 1.ckt, etc.).Absolute path to the log file for Change Extractor operations.
The name of the PostgreSQL replication slot to use for change capture. This slot must be created before starting the Change Extractor:
Optional Parameters
When
true, includes BEGIN and COMMIT markers in the output. Wirekite guarantees that transactions never span multiple output files regardless of this setting.When
true, the extractor exits after processing all pending changes. When false (default), it runs continuously, waiting for new changes indefinitely.Rename the source schema in the output. Format:
sourceSchema:targetSchema.Handover Operations
The Handover utility provides administrative operations for managing replication slots. It can be run independently or as part of orchestrated workflows.Create Replication Slot
Drop Replication Slot
Orchestrator Configuration
When using the Wirekite Orchestrator, prefix source parameters withsource.schema., source.data., or source.change. depending on the operation mode.
Example orchestrator configuration for PostgreSQL source:
When running in data+change mode, the Orchestrator automatically creates the replication slot before data extraction and manages change capture positioning.
