Overview
Wirekite supports Google BigQuery as a target data warehouse for:- Schema Loading - Create target tables from Wirekite’s intermediate schema format
- Data Loading - Bulk load extracted data via Google Cloud Storage staging
- Change Loading (CDC) - Apply ongoing changes using MERGE operations
BigQuery loaders stage data through Google Cloud Storage (GCS) buckets before loading to BigQuery using COPY INTO commands.
Prerequisites
Before configuring BigQuery as a Wirekite target, ensure the following requirements are met:Google Cloud Configuration
- Project Setup: Have a Google Cloud project with BigQuery API enabled
- Dataset: Create a BigQuery dataset in the desired location
- GCS Bucket: Create a Google Cloud Storage bucket for staging data
- Authentication: Configure Application Default Credentials or service account
- IAM Permissions: Ensure the service account has:
bigquery.tables.create,bigquery.tables.updateDatastorage.objects.create,storage.objects.deleteon the GCS bucket
Storage Requirements
Schema Loader
The Schema Loader reads Wirekite’s intermediate schema format (.skt file) and generates BigQuery-appropriate DDL statements for creating target tables.
Required Parameters
Path to the Wirekite schema file (
.skt) generated by the Schema Extractor. Must be an absolute path.Output file for CREATE TABLE statements. Includes both base tables and merge tables for CDC operations.
Output file for constraint definitions (BigQuery has limited constraint support).
Output file for FOREIGN KEY constraints (informational only in BigQuery).
Absolute path to the log file for Schema Loader operations.
Optional Parameters
Output file for DROP TABLE IF EXISTS statements. Set to “none” to skip generation.
Output file for recovery table creation SQL. Set to “none” to skip.
When
true, generates merge tables (_wkm suffix) for CDC operations. Set to false if only doing data loads.Data Mover (GCS)
The Data Mover uploads extracted data files to Google Cloud Storage for subsequent loading into BigQuery.Required Parameters
GCS bucket name (without
gs:// prefix) for staging data files.Local directory containing data files (
.dkt) from the Data Extractor.Absolute path to the log file for Data Mover operations.
Optional Parameters
Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.
Maximum number of parallel upload threads.
When
true, compresses files with gzip before uploading. Changes extension to .dgz.When
true, deletes local files after successful upload to GCS.Data Loader
The Data Loader reads data files from GCS and loads them into BigQuery tables using COPY INTO operations.Required Parameters
Path to a file containing the BigQuery connection string.
Path to the Wirekite schema file used by Schema Loader. Required for table structure information.
Absolute path to the log file for Data Loader operations.
Optional Parameters
Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.
Maximum number of parallel threads for loading tables.
Set to
true if data was extracted using hex encoding instead of base64.BigQuery dataset location (e.g., “US”, “EU”). Only needed if non-default.
Change Loader
The Change Loader applies ongoing data changes (INSERT, UPDATE, DELETE) to BigQuery tables using MERGE operations with shadow tables.Required Parameters
Path to a file containing the BigQuery connection string.
Directory containing change files (
.ckt) from the Change Extractor.Working directory for temporary CSV files during merge operations. Must be writable.
Path to the Wirekite schema file for table structure information.
Absolute path to the log file for Change Loader operations.
Optional Parameters
Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.
Maximum number of change files to process in a single batch.
Set to
true if change data was extracted using hex encoding.BigQuery dataset location (e.g., “US”, “EU”). Only needed if non-default.
Orchestrator Configuration
When using the Wirekite Orchestrator, prefix parameters withmover., target.schema., target.data., or target.change..
Example orchestrator configuration for BigQuery target:
