Postgres

ing --- title: “PostgreSQL” description: “This guide talks about what do you need to do to configure Postgres to use Wirekite.”

Extractor Prerequisites

To configure Postgres as an extractor you must have the following prereqs met.

Make sure that the database version is 10.x and above, as the Wirekite Change Extractor uses the “pgoutput” replication plugin.
Create a user that will be used to execute the extraction operation. The below discussion will assume that user is called wirekite.
Make sure the tables being extracted are readable by wirekite.
As a PostgreSQL superuser - typically postgres - dogrant all on schema public to wirekite;
Also make sure that the replica is configured to usereplica identity full.
Wirekite will also create some internal tables: wirekite_progress and wirekite_action. Make sure the wirekite user has read and write access to those tables.
Make sure that the postgres user has write permission to the directories that wirekite will write to, and that there is enough space to store the data
When doing a Data extract make sure that replication is stopped and the database instance being dumped from is quiescent. This is to make sure that the entire dump is consistent to a specific position.
After doing a Data Extract, but before re-enabling replication, you should create a REPLICATION SLOT for the Change Extractor.
It is highly preferable to run wirekite on a **replica **of a production-facing database instance, since wirekite can generate load on the database server, particularly as the instance needs to be quiesced during Data extraction as mentioned above.
When doing the initial data extract using the Data Extractor from a cloud-managed or operationally remote PostgreSQL instance - in other words, anything with a host that isn’t “localhost” - the loadLocal=true option is required.

Schema Extractor

Schema Extractor is a wirekite utility that extracts the schema related information for some or all the tables on a database instance. It extracts a database-independent notion of the schema for the tables to a file.

The Schema Extractor can be run independently or run as a workflow by the Orchestrator (check Orchestrator section). However you run it, the Schema Extractor needs certain configuration variables, which will have the same configurations whether you run it using Schema Extractor or the Orchestrator. Here are the configuration variables required to run the Schema Extractor.

dsnFile

string

required

dsnFile is a file containing the connection information for the instance that will be accessed by the Schema Extractor.

The connection information has the following format

postgres://username:password@host:port/database

outputDirectory

string

required

outputDirectory is the file system directory that the Schema Extractor will write output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write a file called wirekite_schema.skt in the outputDirectory.

tablesFile

string

required

tablesFile is a file with a list of all the tables whose schemas are to be extracted.

The format of the tablesFile is as follows.

database1.table1
database2.table2
database3.table3

logFile

string

required

logFile is the full path of the log file where the Schema Extractor will write the logging information.

Data Extractor

The Data Extractor is a wirekite utility that extracts the data for given tables. The Data Extractor can be run independently or run in a workflow by the Orchestrator (check Orchestrator section). The Data Extractor needs certain configuration variables, which will have the same values whether you run it using Data Extractor or Orchestrator.

dsnFile

string

required

dsnFile is a file containing the connection information for the PostgreSQL database.

The connection information has the following format

postgres://username:password@host:port/database

outputDirectory

string

required

outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write files called table_name.number.dkt in the outputDirectory.

tablesFile

string

required

tablesFile is a file with a list of all the tables that are being extracted.

The format of the tablesFile is as follows.

database1.table1
database2.table2
database3.table3

logFile

string

required

logFile refers to the full path of the logfile where the extractor will write the logging information.

maxThreads

integer

maxThreads is number of parallel threads that will run to extract data. One thread usually uses 1 cpu and there might be contention if number of threads are too high than the number of cpus. If you’re using a dedicated 32 CPU host for the extraction, we recommend using 32 for this number.

maxPages

integer

maxPages is the maximum number of PostgreSQL storage pages to visit by a single extraction thread. As pages have varying numbers of rows, this coresponds somewhat loosely to the number of rows to be extracted at a time.

hexEncoding

boolean

default:"false"

hexEncoding is used to specify whether string data will be extracted as hex. The default is base64. hexEncoding should only be used if you’re loading to a target that doesn’t handle base64, such as Snowflake.

loadLocal

boolean

default:"false"

loadLocal is used to specify whether the database instance will directly write extraction files itself or will write to files on filesystems that may be only visible to the client. loadLocal=true is required for extracts from cloud-managed database instances

Change Extractor

Change Extractor is a wirekite utility that extracts ongoing changes to rows - Inserts, Updates, Deletes - for given tables. The Change Extractor can be run independently or run as part of the Orchestrator (check Orchestrator section). However you run it, the Change Extractor needs certain configuration variables, which will have the same values whether you run the Change Extractor directly or as part of a workflow using the Orchestrator.

dsnFile

string

required

dsnFile is the file containing the connection information for the PostgreSQL database.

The connection information has the following format

postgres://username:password@host:port/database

outputDirectory

string

required

outputDirectory is the directory to which wirekite will write the output files. The path should be absolute and writable by the operating-system user executing the wirekite binaries. Wirekite will write files called number.ckt in the outputDirectory. The number will be an ever increasing number starting from 0.

tablesFile

string

required

tablesFile is a file with a list of all the tables whose changes are being extracted.

The format of the tablesFile is as follows.

database1.table1
database2.table2
database3.table3

logFile

string

required

logFile refers to the full path of the logfile where the extractor will write the logging information.

processCommits

boolean

default:"false"

Whether to output syntax marking BEGIN and COMMIT or not. We guarantee that a single output file will completely contain its transactions and that no transactions will “span” output files.

eventsPerFlush

number

default:"5000"

The number of change events at which the Change Extractor will flush the buffer to the file. This is to ensure the regular flushing to the output files occur for performance reasons. Note that the buffer will be flushed at a transaction boundary after eventsPerFlush events have been processed, so an individual file file may contain more events than this.

exitWhenIdle

boolean

default:"false"

Whether to exit when all pending changes have been processed, or wait until new changes to show up and “run forever”.

replicationSlot

string

required

The replication slot from where to start extracting changes. Generally this is the position where the data extract was taken.

Introduction

Datatype Matrices

Source Guides

Target Guides

Running Wirekite

Wirekite Database Commandline

ing --- title: “PostgreSQL” description: “This guide talks about what do you need to do to configure Postgres to use Wirekite.”

Extractor Prerequisites

Schema Extractor

Data Extractor

Change Extractor

Introduction

Datatype Matrices

Source Guides

Target Guides

Running Wirekite

Wirekite Database Commandline

​ing --- title: “PostgreSQL” description: “This guide talks about what do you need to do to configure Postgres to use Wirekite.”

​Extractor Prerequisites

​Schema Extractor

​Data Extractor

​Change Extractor

ing --- title: “PostgreSQL” description: “This guide talks about what do you need to do to configure Postgres to use Wirekite.”

Extractor Prerequisites

Schema Extractor

Data Extractor

Change Extractor