Skip to main content

Overview

Wirekite supports Google BigQuery as a target data warehouse for:
  • Schema Loading - Create target tables from Wirekite’s intermediate schema format
  • Data Loading - Bulk load extracted data via Google Cloud Storage staging
  • Change Loading (CDC) - Apply ongoing changes using MERGE operations
BigQuery loaders stage data through Google Cloud Storage (GCS) buckets before loading to BigQuery using COPY INTO commands.

Prerequisites

Before configuring BigQuery as a Wirekite target, ensure the following requirements are met:

Google Cloud Configuration

  1. Project Setup: Have a Google Cloud project with BigQuery API enabled
  2. Dataset: Create a BigQuery dataset in the desired location
  3. GCS Bucket: Create a Google Cloud Storage bucket for staging data
  4. Authentication: Configure Application Default Credentials or service account
  5. IAM Permissions: Ensure the service account has:
    • bigquery.tables.create, bigquery.tables.updateData
    • storage.objects.create, storage.objects.delete on the GCS bucket

Storage Requirements

The GCS bucket must be accessible from both the loader host and BigQuery. Ensure the bucket is in the same region as your BigQuery dataset for best performance.
Use Application Default Credentials for simplest authentication: run gcloud auth application-default login on the loader host.

Schema Loader

The Schema Loader reads Wirekite’s intermediate schema format (.skt file) and generates BigQuery-appropriate DDL statements for creating target tables.

Required Parameters

schemaFile
string
required
Path to the Wirekite schema file (.skt) generated by the Schema Extractor. Must be an absolute path.
createTableFile
string
required
Output file for CREATE TABLE statements. Includes both base tables and merge tables for CDC operations.
createConstraintFile
string
required
Output file for constraint definitions (BigQuery has limited constraint support).
createForeignKeyFile
string
required
Output file for FOREIGN KEY constraints (informational only in BigQuery).
logFile
string
required
Absolute path to the log file for Schema Loader operations.

Optional Parameters

dropTableFile
string
default:"none"
Output file for DROP TABLE IF EXISTS statements. Set to “none” to skip generation.
createRecoveryTablesFile
string
default:"none"
Output file for recovery table creation SQL. Set to “none” to skip.
createMergeTables
boolean
default:"true"
When true, generates merge tables (_wkm suffix) for CDC operations. Set to false if only doing data loads.

Data Mover (GCS)

The Data Mover uploads extracted data files to Google Cloud Storage for subsequent loading into BigQuery.

Required Parameters

gcsBucket
string
required
GCS bucket name (without gs:// prefix) for staging data files.
dataDirectory
string
required
Local directory containing data files (.dkt) from the Data Extractor.
logFile
string
required
Absolute path to the log file for Data Mover operations.

Optional Parameters

gcsCredentials
string
Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.
maxThreads
integer
default:"10"
Maximum number of parallel upload threads.
gzipFiles
boolean
default:"false"
When true, compresses files with gzip before uploading. Changes extension to .dgz.
removeFiles
boolean
default:"true"
When true, deletes local files after successful upload to GCS.

Data Loader

The Data Loader reads data files from GCS and loads them into BigQuery tables using COPY INTO operations.

Required Parameters

dsnFile
string
required
Path to a file containing the BigQuery connection string.
Connection string format:
bigquery://PROJECT_ID/DATASET?gcs_bucket=BUCKET_NAME
Example:
bigquery://my-project/my_dataset?gcs_bucket=my-staging-bucket
schemaFile
string
required
Path to the Wirekite schema file used by Schema Loader. Required for table structure information.
logFile
string
required
Absolute path to the log file for Data Loader operations.

Optional Parameters

gcsCredentials
string
Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.
maxThreads
integer
default:"5"
Maximum number of parallel threads for loading tables.
hexEncoding
boolean
default:"false"
Set to true if data was extracted using hex encoding instead of base64.
location
string
BigQuery dataset location (e.g., “US”, “EU”). Only needed if non-default.
The Data Loader creates temporary staging tables with auto-expiration for intermediate processing.

Change Loader

The Change Loader applies ongoing data changes (INSERT, UPDATE, DELETE) to BigQuery tables using MERGE operations with shadow tables.

Required Parameters

dsnFile
string
required
Path to a file containing the BigQuery connection string.
Connection string format:
bigquery://PROJECT_ID/DATASET?gcs_bucket=BUCKET_NAME
inputDirectory
string
required
Directory containing change files (.ckt) from the Change Extractor.
workDirectory
string
required
Working directory for temporary CSV files during merge operations. Must be writable.
schemaFile
string
required
Path to the Wirekite schema file for table structure information.
logFile
string
required
Absolute path to the log file for Change Loader operations.

Optional Parameters

gcsCredentials
string
Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.
maxFilesPerBatch
integer
default:"30"
Maximum number of change files to process in a single batch.
hexEncoding
boolean
default:"false"
Set to true if change data was extracted using hex encoding.
location
string
BigQuery dataset location (e.g., “US”, “EU”). Only needed if non-default.
The Change Loader should not start until the Data Loader has successfully completed the initial full load.

Orchestrator Configuration

When using the Wirekite Orchestrator, prefix parameters with mover., target.schema., target.data., or target.change.. Example orchestrator configuration for BigQuery target:
# Main configuration
source=postgres
target=bigquery

# Data mover (GCS)
mover.gcsBucket=my-staging-bucket
mover.dataDirectory=/opt/wirekite/output/data
mover.logFile=/var/log/wirekite/data-mover.log
mover.maxThreads=10
mover.removeFiles=true

# Schema loading
target.schema.schemaFile=/opt/wirekite/output/schema/wirekite_schema.skt
target.schema.createTableFile=/opt/wirekite/output/schema/create_tables.sql
target.schema.createConstraintFile=/opt/wirekite/output/schema/constraints.sql
target.schema.createForeignKeyFile=/opt/wirekite/output/schema/foreign_keys.sql
target.schema.logFile=/var/log/wirekite/schema-loader.log

# Data loading
target.data.dsnFile=/opt/wirekite/config/bigquery.dsn
target.data.schemaFile=/opt/wirekite/output/schema/wirekite_schema.skt
target.data.logFile=/var/log/wirekite/data-loader.log
target.data.maxThreads=8

# Change loading (CDC)
target.change.dsnFile=/opt/wirekite/config/bigquery.dsn
target.change.inputDirectory=/opt/wirekite/output/changes
target.change.workDirectory=/opt/wirekite/work
target.change.schemaFile=/opt/wirekite/output/schema/wirekite_schema.skt
target.change.logFile=/var/log/wirekite/change-loader.log
target.change.maxFilesPerBatch=30
For complete Orchestrator documentation, see the Execution Guide.