Loader Prerequisites
To configure the Wirekite Loaders for PostgreSQL, you must have the following prereqs met.
- Make sure that you have a properly configured PostgreSQL user with the GRANTs needed to CREATE TABLEs, access FILEs, run INSERT/UPDATE/DELETE.
- Use the psql command line tool to interact with Postgres.
- Make sure your Extractor host and database instance are set up to run the extraction. In most situations, you will run the Extract and Load operation at the same time using the Wirekite Orchestrator tool.
- Make sure that the Wirekite target metadata tables - wirekite_progress and wirekite_action - exist in your PostgreSQL dataworld and are read-and-writable from the PostgreSQL user and host that will do the extraction.
Wirekite Schema Loader for PostgreSQL
The Wirekite Schema Loader for PostgreSQL is a utility that takes the schema file created by the Schema Extractor on the extraction host/database as input. The Schema Loader for PostgreSQL will generate PostgreSQL-appropriate SQL to CREATE the target tables in the target. If you need to add additional storage directives or other schema elements, you can use the CREATE TABLE output as a basis for this. But don’t change the column names, column order, or change datatypes or constraints to ones that are incompatible with the ones created here.
You can take these files and load them directly to PostgreSQL to CREATE the target tables, using the “psql” command-line tool.
Here are the configuration variables required to run the Schema Loader.
schemaFile is the file containing Wirekite schema data that was generated by a Wirekite Schema Extractor on the source database. The Schema Loader will use the contents of this file to generate PostgreSQL SQL DDL to create the target schema. The path should be absolute and readable by the Schema Loader executable.
createTableFile is an output file that will contain CREATE TABLE statements appropriate to PostgreSQL. Note that each source table will have two entries: one for the table itself, and one for its “MERGE” table used to load change events by the Wirekite Change Loader for PostgreSQL.
createForeignFile is an output file that will be used for FOREIGN KEY definitions. If you wish to add FOREIGN KEY specs - this should likely wait until after the initial load completes for performance reasons - you can run this file to create them.
createConstraintFile is an output file with non-columnar constraints, including indexes. Note that column-level constraints such as NOT NULL are handle by CREATE TABLE.
dropTableFile is a file that will contain DROP TABLE IF EXISTS statements. DROP TABLE statements will be created for both base tables and MERGE tables. This is optional.
logFile is where the Wirekite Schema Loader for PostgreSQL will write its logging output. This needs to be writable by the user who runs the load.
createMergeTables is an optional argument. Its default is true. If set to false, CREATE TABLE statements will be generated only for base tables. MERGE tables will be skipped. This can be set to false if you only intend to do DATA loads, but not CHANGE loads.
Wirekite Data Loader for PostgreSQL
The Wirekite Data Loader for PostgreSQL uses the output files written by the Extractor or the File Data Mover and loads them to PostgreSQL.
dsnFile is a file containing the connection information for the instance that will be accessed by the Data Loader.
The PostgreSQL connection information has the following format
postgres://username:password@host:port/database
schemaFile is the file containing Wirekite schema data that was generated by a Wirekite Schema Extractor on the source database, and used by the Schema Loader mentioned above. The path should be absolute and readable by the Data Loader executable.
workDirectory is the path to the directory where the Data Loader will find the data files to be loaded. It is where either the Data Extractor or File Mover wrote data files.
logFile is where the Data Loader will write its logging output.
hexEncoding is optional and its default is false. It should only be set if the Data Extractor extracted data using hex format.
maxThreads is the maximum number of parallel threads used to load data to PostgreSQL. The default is 5. We recommend setting this to the number of CPUs on the host.
removeFiles specifies whether the Data Loader will remove files from the workDirectory. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of space in the bucket.
Wirekite Change Loader for PostgreSQL
The Wirekite Change Loader for PostgreSQL loads changes to the target tables, using output created by a Wirekite Change Extractor. The Wirekite Change Loader for PostgreSQL uses a merging approach to load changes, and needs to move transient data to an scratch directory, which will be used to load the Merge Tables.
The Change Loader shouldn’t start until the Data Loader for the same table-world has successfully completed its full load.
dsnFile is a file containing the connection information for the instance that will be accessed by the Change Loader.
The PostgreSQL connection information has the following format
postgres://username:password@host:port/database
dataDirectory is where the Wirekite Change Extractor wrote its files. These will be sourced for changes by the Wirekite Change Loader for PostgreSQL
workDirectory is where the Change Loader writes intermediate files.
removeFiles specifies whether the Change Loader will remove files from the dataDirectory once it has fully processed their changes. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of storage space.
logFile is where the Data Loader will write its logging output.