Extractor Prerequisites

To configure SQL Server as a wirekite extractor you must have the following prereqs met.
  1. Make sure that the database version is 2016 and above.
  2. Make sure that the database supports change data capture capability and change tables can be created.
  3. Make sure change data capture is enabled. This can be done using EXEC sys.sp_cdc_enable_db; on the database level. To verify run SELECT name, is_cdc_enabled FROM sys.databases WHERE name = 'database';
  4. Make sure change data capture is enable for the tables you are monitoring. This can be done using EXEC sys.sp_cdc_enable_table @source_schema = 'schema', @source_name = 'table', @role_name = NULL, @supports_net_changes = 1;. We only need to enable net logging and not full logging. To verify run SELECT name, is_tracked_by_cdc FROM sys.tables WHERE name = 'table';
  5. To view the CDC configuration you can run SELECT * FROM cdc.change_tables;
  6. Make sure CLR is enabled. This can be done using EXEC sp_configure 'show advanced options', 1; RECONFIGURE; EXEC sp_configure 'clr enabled', 1; RECONFIGURE; EXEC sp_configure 'clr enabled';
  7. Make sure the changes are being capture by running SELECT * FROM cdc.schema_table_CT;
  8. Make sure that you configure a user for wirekite with full privileges on the tables that we are extracting.
  9. Wirekite will also create some internal tables(wirekite_progress and wirekite_action). Make sure wirekite user has read and write privileges to those tables.
  10. Make sure that the user running wirekite has privileges to write to the directory where the wirekite output files will be generated.
  11. When doing a data extract make sure that replication is stopped.
  12. It is highly preferable to run wirekite on a replica of production database, since wirekite can generate lot of load on the database server.

Schema Extractor

Schema extractor is a wirekite utility that extracts the schema related information for given tables. The schema extractor can be run independently or run as part of orchestrator (check orchestrator section). However you run it, the schema extractor needs certain configuration variables, which will have the same configurations whether you run it using schema extractor or orchestrator. Here are the configuration variables required to run schema extractor.
dsnFile
string
required
dsnFile is the file containing the connection information for the database.
The connection information has the following format
sqlserver://userame:password@host:port?database=databasename&encrypt=disable&connection+timeout=timeoutinseconds
outputDirectory
string
required
outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write a file called wirekite_schema.skt in the outputDirectory.
tablesFile
string
required
tablesFile is a file with a list of all the tables that are being extracted.
The format of the tablesFile is as follows.
database1.table1
database2.table2
database3.table3
logFile
string
required
logFile refers to the full path of the logfile where the extractor will write the logging information.

Data Extractor

Data extractor is a wirekite utility that extracts the data for given tables. The data extractor can be run independently or run as part of orchestrator (check orchestrator section). However you run it, the data extractor needs certain configuration variables, which will have the same configurations whether you run it using data extractor or orchestrator.
dsnFile
string
required
dsnFile is the file containing the connection information for the database.
The connection information has the following format
sqlserver://userame:password@host:port?database=databasename&encrypt=disable&connection+timeout=timeoutinseconds
outputDirectory
string
required
outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write files called table_name.number.dkt in the outputDirectory.
tablesFile
string
required
tablesFile is a file with a list of all the tables that are being extracted.
The format of the tablesFile is as follows.
database1.table1
database2.table2
database3.table3
logFile
string
required
logFile refers to the full path of the logfile where the extractor will write the logging information.
maxThreads
number
default:"5"
The number of parallel threads that will run at the same time to extract data. One thread usually uses 1 cpu and there might be contention if number of threads are too high than the number of cpus.
maxRowsPerDump
number
default:"200000"
The number of rows written to one dump file.
hexEncoding
boolean
default:"false"
Whether to extract string data as hex or base 64.
useExternalDumper
boolean
default:"false"
Whether to use a C extractor instead of default SQL Server extractor. This is a bit faster.

Change Extractor

Change extractor is a wirekite utility that extracts ongoing changes - Inserts, Updates, Deletes - for given tables. The change extractor can be run independently or run as part of orchestrator (check orchestrator section). However you run it, the change extractor needs certain configuration variables, which will have the same configurations whether you run it using change extractor or orchestrator.
dsnFile
string
required
dsnFile is the file containing the connection information for the database.
The connection information has the following format
sqlserver://userame:password@host:port?database=databasename&encrypt=disable&connection+timeout=timeoutinseconds
outputDirectory
string
required
outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write files called number.ckt in the outputDirectory. The number will be an ever increasing number starting from 0.
tablesFile
string
required
tablesFile is a file with a list of all the tables that are being extracted.
The format of the tablesFile is as follows.
database1.table1
database2.table2
database3.table3
logFile
string
required
logFile refers to the full path of the logfile where the extractor will write the logging information.
verbose
boolean
default:"false"
Whether to generate verbose output. in verbose mode the change extractor can generate extra file called metdata which can help you troubleshoot any issue.
processOneArchivelog
boolean
default:"false"
Whether to process one archivelog and exit or continue processing more archivelogs. This can be used for testing, before enabling extractor to process archivelogs infinitely.
processCommits
boolean
default:"false"
Whether to process begin and commoit statements or not.
eventsPerFlush
number
default:"5000"
The number of events at which wirekite will flush the buffer to the file. This is to ensure the regular flushing to the output files occur for performance reasons.
exitWhenIdle
boolean
default:"false"
Whether to exit when all the archive logs have been processed, or wait for new archive logs to show up.
startLsn
number
default:"0"
The log sequence number from where to start extracting changes. Generally this is the log sequence number where the data extract was taken.
lsnBatchSize
number
default:"1000000"
The number of LSNs to extract in one batch. This is used for performance. Databases which have autocommit on or do a lot of small transactions might generate lot of LSNs, in which case this number can be higher.
linesPerFile
number
default:"100000"
The number of lines to write to the output file before rotating the file. Higher this number bigger the files.