To configure MySQL or MariaDB as an extractor you must have the following prereqs met. Below references to MySQL also apply to MariaDB setups.
- Make sure that the database version is
5.x
and above.
- Make sure the tables being extracted are InnoDB tables.
- Make sure binlogging is enabled and format is ROW.
binlog_format = ROW
- Also make sure that
binlog_row_image
is set to appropriate value. Wirekite does not mandate any particular value, as long as the data that you are trying to extract is logged.
- Make sure that you configure a user for wirekite with full privileges on the tables that we are extracting. It is strongly preferred to create a separate user for wirekite.
- Wirekite will also create some internal tables(wirekite_progress and wirekite_action). Make sure wirekite user has read and write privileges to those tables.
- Make sure that MySQL has privileges to write to the directory where the Wirekite output files will be generated. This can be configured by setting an appropriate value for
secure_file_priv
.
- Also make sure
group_concat_max_len = 50000
- When doing a data extract make sure that replication is stopped. This is to make sure that the dump is consistent to given mysql coordinate position.
- It is highly preferable to run wirekite on a slave of production database, since wirekite can generate lot of load on the database server.
Schema extractor is a wirekite utility that extracts the schema related information for given tables. The schema extractor can be run independently or run as part of orchestrator (check orchestrator section). However you run it, the schema extractor needs certain configuration variables, which will have the same configurations whether you run it using schema extractor or orchestrator.
Here are the configuration variables required to run schema extractor.
dsnFile is the file containing the connection information for the database.
The connection information has the following format
username:password@tcp(host:port)/public?multiStatements=true&loc=Local
outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write a file called wirekite_schema.skt in the outputDirectory.
tablesFile is a file with a list of all the tables that are being extracted.
The format of the tablesFile is as follows.
database1.table1
database2.table2
database3.table3
logFile refers to the full path of the logfile where the extractor will write the logging information.
Data extractor is a wirekite utility that extracts the data for given tables. The data extractor can be run independently or run as part of orchestrator (check orchestrator section). However you run it, the data extractor needs certain configuration variables, which will have the same configurations whether you run it using data extractor or orchestrator.
dsnFile is the file containing the connection information for the mysql database.
The connection information has the following format
username:password@tcp(host:port)/public?multiStatements=true&loc=Local
outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write files called table_name.number.dkt in the outputDirectory.
tablesFile is a file with a list of all the tables that are being extracted.
The format of the tablesFile is as follows.
database1.table1
database2.table2
database3.table3
logFile refers to the full path of the logfile where the extractor will write the logging information.
The number of parallel threads that will run at the same time to extract data. One thread usually uses 1 cpu and there might be contention if number of threads are too high than the number of cpus.
The number of rows written to one dump file.
Whether to extract string data as hex or base 64.
Change extractor is a wirekite utility that extracts ongoing changes - Inserts, Updates, Deletes - for given tables. The change extractor can be run independently or run as part of orchestrator (check orchestrator section). However you run it, the change extractor needs certain configuration variables, which will have the same configurations whether you run it using change extractor or orchestrator.
dsnFile is the file containing the connection information for the mysql database.
The connection information has the following format
username:password@tcp(host:port)/public?multiStatements=true&loc=Local
outputDirectory represents the directory to which wirekite will write the output files. The path should be absolute and writable by the wirekite binaries. Wirekite will write files called number.ckt in the outputDirectory. The number will be an ever increasing number starting from 0.
tablesFile is a file with a list of all the tables that are being extracted.
The format of the tablesFile is as follows.
database1.table1
database2.table2
database3.table3
logFile refers to the full path of the logfile where the extractor will write the logging information.
Whether to generate verbose output. in verbose mode the change extractor can generate extra file called metdata which can help you troubleshoot any issue.
Whether to process one binlog and exit or continue processing more binlogs. This can be used for testing, before enabling extractor to process binlogs infinitely.
Whether to process begin and commoit statements or not.
The number of events at which wirekite will flush the buffer to the file. This is to ensure the regular flushing to the output files occur for performance reasons.
Whether to exit when all the binlogs have been processed, or wait for new binlogs to show up.
The binlog file from where to start extracting changes. Generally this is the mysql coordinates where the data extract was taken. This does not need to be the full path of the binlog, just the binlog file name.
The binlog position in the binlogFile from where to start extracting changes. Generally this is the mysql coordinates where the data extract was taken.