Loader Prerequisites
To configure the Wirekite Loaders for Firebolt, you must have the following prereqs met.
- Make sure that you have a properly configured Firebolt user.
- At a minimum, you will need an Amazon Web Services (AWS) S3 storage bucket. The bucket needs to be accessible for read and write from the user and host that will run the Wirekite Loader, and needs to be accessible for read from the Firebolt user. The bucket will be accessed directly from Firebolt, using its COPY SQL.
- Your S3 bucket needs to be in the same AWS Region as your Firebolt dataworld.
- For best results, you should install the “aws” command line tool in your local environment and do the needful to set up shell-level authentication to access it. Wirekite and Firebolt don’t directly use this tool, but this tool is helpful in managing your bucket from the command line. AWS CLI tool install instructions
- As of this writing, Firebolt doesn’t have a command-line tool, although you can access its SQL from its web interface. The Wirekite “cmdline” command-line tool is a simple tool that can access Firebolt SQL when given the right DSN/connection string. Wirekite “cmdline” tool
- Make sure your Extractor host and database instance are set up to run the extraction. In most situations, you will run the Extract and Load operation at the same time using the Wirekite Orchestrator tool.
- Make sure that the Wirekite target metadata tables - wirekite_progress and wirekite_action - exist in your Firebolt dataworld and are read-and-writable from the Firebolt user and host that will do the extraction. The simplest way to insure this is to access these tables using the Wirekite cmdline tool with the appropriate Firebolt DSN/connection string.
Wirekite Schema Loader for Firebolt
The Wirekite Schema Loader for Firebolt is a utility that takes the schema file created by the Schema Extractor on the extraction host/database as input. The Schema Loader for Firebolt will generate Firebolt-appropriate SQL to CREATE the target tables in the target.
You can take these files and load them directly to Firebolt to CREATE the target tables, either using the Firebolt Web SQL interface or the Wirekite cmdline tool.
Here are the configuration variables required to run the Schema Loader.
schemaFile is the file containing Wirekite schema data that was generated by a Wirekite Schema Extractor on the source database. The Schema Loader will use the contents of this file to generate Firebolt SQL DDL to create the target schema. The path should be absolute and readable by the Schema Loader executable.
createTableFile is an output file that will contain CREATE TABLE statements appropriate to Firebolt. Note that each source table will have two entries: one for the table itself, and one for its “MERGE” table used to load change events by the Wirekite Change Loader for Firebolt.
createForeignFile is an output file that is always empty at the moment. Firebolt does not support FOREIGN KEYs, so while this should be specified, the file will always be empty.
createConstraintFile is an output file that is always empty at the moment. Firebolt does not support external constraints such as CHECK, so while this should be specified, the file will always be empty. Note that column-level constraints such as NOT NULL are handle by CREATE TABLE.
dropTableFile is a file that will contain DROP TABLE IF EXISTS statements. DROP TABLE statements will be created for both base tables and MERGE tables. This is optional.
logFile is where the Wirekite Schema Loader will write its logging output.
createMergeTables is an optional argument. Its default is true. If set to false, CREATE TABLE statements will be generated only for base tables. MERGE tables will be skipped. This can be set to false if you only intend to do DATA loads, but not CHANGE loads.
Wirekite Data Mover for Firebolt/AWS
The Data Mover will be run on the data extraction host, and will move files dumped by the Data Extractor to an AWS S3 Bucket, where the Wirekite Data Loader will reference them when loading to Firebolt.
awsBucket is the name of the AWS S3 storage bucket mentioned above, without an s3:// prefix. This is where the Wirekite Data Mover will move its files.
awsRegion is the name of the AWS Region where the S3 storage bucket mentioned above resides.
dataDirectory is where the Wirekite Data Extractor wrote its files. These will be copied from the directory to the AWS S3 storage bucket.
logFile is where the Wirekite Data Mover will write its logging output.
maxThreads is the maximum number of parallel threads used to upload data to the bucket. The default is 10.
removeFiles specifies whether the Data Mover will remove files from the dataDirectory once it has safely copied them to the AWS S3 bucket. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of storage space.
Wirekite Data Loader for Firebolt
The Wirekite Data Loader for Firebolt uses the output files written to the AWS S3 Bucket by the Wirekite Data Mover for AWS/Firebolt and loads the records in them to the target tables in Firebolt.
dsnFile is a file containing the connection information for the instance that will be accessed by the Data Loader.
The Firebolt connection information has the following format
firebolt://<firebolt-db>?account_name=<firebolt-acct-name>&account_id=<firebolt-acct-id>&client_secret=<firebolt-client-secret>&engine=<firebolt-engine>
Note that the same dsnFile can be used to connect using the Wirekite cmdline tool.
awsBucket is the name of the AWS S3 storage bucket mentioned above, without an s3:// prefix. This is where the Wirekite Data Loader will reference its files.
awsRegion is the name of the AWS Region where the S3 storage bucket mentioned above resides.
awsCredentialsFile is optional and only necessary if AWS access to the bucket hasn’t been set up in Firebolt itself. It is a file-system file containing AWS credentials information. Its format isaws_access_key=<access-key> aws_secret_access_key=<aws-secret-key>
For more info about these keys, refer to the AWS IAM User Guide
schemaFile is the file containing Wirekite schema data that was generated by a Wirekite Schema Extractor on the source database, and used by the Schema Loader mentioned above. The path should be absolute and readable by the Data Loader executable.
logFile is where the Data Loader will write its logging output.
hexEncoding is optional and its default is false. It should only be set if the Data Extractor extracted data using hex format.
maxThreads is the maximum number of parallel threads used to load data to Firebolt. The default is 5. We recommend setting this to the number of CPUs on the host.
removeFiles specifies whether the Data Loader will remove files from the AWS S3 Bucket once it has safely loaded their contents to Firebolt. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of space in the bucket.
Wirekite Change Loader for Firebolt
The Wirekite Change Loader for Firebolt loads changes to the target tables, using output created by a Wirekite Change Extractor. The Wirekite Change Loader for Firebolt uses a merging approach to load changes, and needs to move transient data to an AWS bucket, which can be the same bucket used for the Wirekite Data Loader for Firebolt.
The Change Loader shouldn’t start until the Data Loader for the same table-world has successfully completed its full load.
dsnFile is a file containing the connection information for the instance that will be accessed by the Change Loader.
The Firebolt connection information has the following format
firebolt://<firebolt-db>?account_name=<firebolt-acct-name>&account_id=<firebolt-acct-id>&client_secret=<firebolt-client-secret>&engine=<firebolt-engine>
Note that the same dsnFile can be used to connect using the Wirekite cmdline tool.
awsBucket is the name of the AWS S3 storage bucket mentioned above, without an s3:// prefix. This is where the Wirekite Change Loader will upload and reference its intermediate data.
awsRegion is the name of the AWS Region where the S3 storage bucket mentioned above resides.
awsCredentialsFile is optional and only necessary if AWS access to the bucket hasn’t been set up in Firebolt itself. It is a file-system file containing AWS credentials information. Its format isaws_access_key=<access-key> aws_secret_access_key=<aws-secret-key>
For more info about these keys, refer to the AWS IAM User Guide
dataDirectory is where the Wirekite Change Extractor wrote its files. These will be sourced for changes by the Wirekite Change Loader for Firebolt.
workDirectory is where the Change Loader writes intermediate files that are uploaded to the AWS S3 bucket.
removeFiles specifies whether the Change Loader will remove files from the dataDirectory once it has fully processed their changes. The default is true. This may be set to false while troubleshooting, but should not normally be set as these files can take a lot of storage space.
logFile is where the Data Loader will write its logging output.