Wirekite Components
Wirekite has three main database entities — schema, data and change — and three components — Extractor, Mover and Loader. The entities are the objects in the database that wirekite works upon, while components are the operators that work on the entities. Lets put these all together and we get a sequence in which all the components work on the entities to ultimately establish a target database which is continuously updated from the source database. The sequence that is to be followed in a real production infrastructure is as follows.Schema
Schema Extractor
Extract the Schema of your source database. One Time.
Schema Mover
Move the Schema from source to target database. One Time.
Schema Loader
Load the schema on the target database. One Time.
Data
Data Extractor
Extract the Data of your source database. One Time.
Data Mover
Move the Data from source to target database. One Time.
Data Loader
Load the data on the target database. One Time.
Change
Change Extractor
Extract the Changes from your source database. Continuous.
Change Mover
Move the Change files from source to target database. Continuous.
Change Loader
Load the changes on the target database. Continuous.
Sequential Steps
The following architectural diagram how data will be migrated to a target database and how a replication pipeline will be established from a source to a target database instance.Wirekite Implementation and Sequence
Design Points
These are some of the noteworthy points regarding the above diagram.- Wirekite operates using a replica of the customer-facing database instance. This is so we can avoid taking down the active production database instance, while having a quiesced database instance during the initial data extraction. The details of setting up a replica is database vendor-specific.
- Wirekite components - Extractor, Mover, and Loader - can and should run in parallel. This minimizes the amount of time required to load an initial dump, as well as minimizing storage space needed for in-flight data files as they are fetched by the Extractor and loaded by the Loader. The shorter the initial dump/move/load time the shorter the lag when we switch from initial extraction mode to change data capture mode.
- The performance characteristics - latency and throughput - of the connectivity between source and target as well as the storage database instances and their hosts should be appropriately scaled to the size and activity of the database.
- The performance of the change data capture must be more than the rate of overall change events in the source database. For example if you are generating a 1 GB MySQL or MariaDB binlog in 10 seconds, you need to scale your endpoints and networking so that the contents of this binlog are extracted, moved, and loaded in significantly less than 10 seconds. If this is not the case, the target database will lag the source database “forever”. Specifically the following equation must hold true.
wirekite_extraction_time + wirekite_ship_time + wirekite_load_time < source_database_change_generation_time
