Loader Prerequisites
To configure the Wirekite Loaders for Snowflake, you must have the following prereqs met.
- Make sure that you have a properly configured Snowflake user.
- You will either need to install the Snowflake SnowSQL command line tool on the host where you’ll run the load, or use the Wirekite “cmdline” tool.
- Make sure your Extractor host and database instance are set up to run the extraction. In most situations, you will run the Extract and Load operation at the same time using the Wirekite Orchestrator tool.
- Make sure that the Wirekite target metadata tables - wirekite_progress and wirekite_action - exist in your Snowflake dataworld and are read-and-writable from the Snowflake user and host that will do the extraction. The simplest way to insure this is to access these tables using the Wirekite cmdline tool with the appropriate Snowflake DSN/connection string.
Wirekite Schema Loader for Snowflake
The Wirekite Schema Loader for Snowflake is a utility that takes the schema file created by the Schema Extractor on the extraction host/database as input. The Schema Loader for Snowflake will generate Snowflake-appropriate SQL to CREATE the target tables in the target.
You can take these files and load them directly to Snowflake to CREATE the target tables, either using the Snowflake SnowSQL tool or the Wirekite cmdline tool.
Here are the configuration variables required to run the Schema Loader.
schemaFile is the file containing Wirekite schema data that was generated by a Wirekite Schema Extractor on the source database. The Schema Loader will use the contents of this file to generate Firebolt SQL DDL to create the target schema. The path should be absolute and readable by the Schema Loader executable.
createTableFile is an output file that will contain CREATE TABLE statements appropriate to Snowflake. Note that each source table will have two entries: one for the table itself, and one for its “MERGE” table used to load change events by the Wirekite Change Loader for Snowflake.
createForeignFile is an output file for FOREIGN KEY constraints. You may elect to not load foreign key constraints if you don’t need them in your Snowflake dataworld.
createConstraintFile is an output file for other constraints such as CHECK constraints. Snowflake doesn’t enforce many constaints, so you may choose to not load this file if you don’t need them in your Snowflake dataworld. Note that column-level constraints such as NOT NULL are handle by CREATE TABLE.
dropTableFile is a file that will contain DROP TABLE IF EXISTS statements. DROP TABLE statements will be created for both base tables and MERGE tables. This is optional.
logFile is where the Wirekite Schema Loader will write its logging output.
createMergeTables is an optional argument. Its default is true. If set to false, CREATE TABLE statements will be generated only for base tables. MERGE tables will be skipped. This can be set to false if you only intend to do DATA loads, but not CHANGE loads.
Wirekite Data Mover for Snowflake
The Data Mover will be run on the data extraction host, and will move files dumped by the Data Extractor to a staging area managed by Snowflake using its PUT command, where the Wirekite Data Loader will reference them when loading to Snowflake.
dsnFile is a file containing the connection information for the instance that will be accessed by the Data Mover.
Snowflake connection information is discussed here. Wirekite uses the Golang connector format:
<snowflake-user>:<snowflake-pwd>@<snowflake-orgname>-<snowflake-acct>/<database>/<schema>?<opt-warehouse-name>
Note that the same dsnFile can be used to connect using the Wirekite cmdline tool.
dataDirectory is where the Wirekite Data Extractor wrote its files. These will be copied from the directory to the Snowflake staging area.
logFile is where the Wirekite Data Mover will write its logging output.
maxThreads is the maximum number of parallel threads used to upload data to the bucket. The default is 10.
removeFiles specifies whether the Data Mover will remove files from the dataDirectory once it has safely copied them to Snowflake stage. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of storage space.
Wirekite Data Loader for Snowflake
The Wirekite Data Loader for Snowflake uses the output files written to the Snowflake staging area by the Wirekite Data Mover for Snowflake and loads the records in them to the target tables in Snowflake.
dsnFile is a file containing the connection information for the instance that will be accessed by the Data Loader.
Snowflake connection information is discussed here. Wirekite uses the Golang connector format:
<snowflake-user>:<snowflake-pwd>@<snowflake-orgname>-<snowflake-acct>/<database>/<schema>?<opt-warehouse-name>
Note that the same dsnFile can be used to connect using the Wirekite cmdline tool.
schemaFile is the file containing Wirekite schema data that was generated by a Wirekite Schema Extractor on the source database, and used by the Schema Loader mentioned above. The path should be absolute and readable by the Data Loader executable.
logFile is where the Data Loader will write its logging output.
hexEncoding is optional and its default is false. It should only be set if the Data Extractor extracted data using hex format.
maxThreads is the maximum number of parallel threads used to load data to Snowflake. The default is 5. We recommend setting this to the number of CPUs on the host.
Wirekite Change Loader for Snowflake
The Wirekite Change Loader for Snowflake loads changes to the target tables, using output created by a Wirekite Change Extractor. The Wirekite Change Loader for Snowflake uses a merging approach to load changes, and needs to move transient data to a Snowflake stage.
The Change Loader shouldn’t start until the Data Loader for the same table-world has successfully completed its full load.
dsnFile is a file containing the connection information for the instance that will be accessed by the Change Loader.
Snowflake connection information is discussed here. Wirekite uses the Golang connector format:
<snowflake-user>:<snowflake-pwd>@<snowflake-orgname>-<snowflake-acct>/<database>/<schema>?<opt-warehouse-name>
Note that the same dsnFile can be used to connect using the Wirekite cmdline tool.
dataDirectory is where the Wirekite Change Extractor wrote its files. These will be sourced for changes by the Wirekite Change Loader for Snowflake.
workDirectory is where the Change Loader writes intermediate files that are uploaded to the Snowflake staging area.
removeFiles specifies whether the Change Loader will remove files from the dataDirectory once it has fully processed their changes. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of storage space.
logFile is where the Change Loader will write its logging output.