Google BigQuery

Overview

Wirekite supports Google BigQuery as a target data warehouse for:

Schema Loading - Create target tables from Wirekite’s intermediate schema format
Data Loading - Bulk load extracted data via Google Cloud Storage staging
Change Loading (CDC) - Apply ongoing changes using MERGE operations

BigQuery loaders stage data through Google Cloud Storage (GCS) buckets before loading to BigQuery using COPY INTO commands.

Prerequisites

Before configuring BigQuery as a Wirekite target, ensure the following requirements are met:

Google Cloud Configuration

Project Setup: Have a Google Cloud project with BigQuery API enabled
Dataset: Create a BigQuery dataset in the desired location
GCS Bucket: Create a Google Cloud Storage bucket for staging data
Authentication: Configure Application Default Credentials or service account
IAM Permissions: Ensure the service account has:
- bigquery.tables.create, bigquery.tables.updateData
- storage.objects.create, storage.objects.delete on the GCS bucket

Storage Requirements

The GCS bucket must be accessible from both the loader host and BigQuery. Ensure the bucket is in the same region as your BigQuery dataset for best performance.

Use Application Default Credentials for simplest authentication: run gcloud auth application-default login on the loader host.

Schema Loader

The Schema Loader reads Wirekite’s intermediate schema format (.skt file) and generates BigQuery-appropriate DDL statements for creating target tables.

Required Parameters

schemaFile

string

required

Path to the Wirekite schema file (.skt) generated by the Schema Extractor. Must be an absolute path.

createTableFile

string

required

Output file for CREATE TABLE statements. Includes both base tables and merge tables for CDC operations.

createConstraintFile

string

required

Output file for constraint definitions (BigQuery has limited constraint support).

createForeignKeyFile

string

required

Output file for FOREIGN KEY constraints (informational only in BigQuery).

logFile

string

required

Absolute path to the log file for Schema Loader operations.

Optional Parameters

dropTableFile

string

default:"none"

Output file for DROP TABLE IF EXISTS statements. Set to “none” to skip generation.

createRecoveryTablesFile

string

default:"none"

Output file for recovery table creation SQL. Set to “none” to skip.

createMergeTables

boolean

default:"true"

When true, generates merge tables (_wkm suffix) for CDC operations. Set to false if only doing data loads.

Data Mover (GCS)

The Data Mover uploads extracted data files to Google Cloud Storage for subsequent loading into BigQuery.

Required Parameters

gcsBucket

string

required

GCS bucket name (without gs:// prefix) for staging data files.

dataDirectory

string

required

Local directory containing data files (.dkt) from the Data Extractor.

logFile

string

required

Absolute path to the log file for Data Mover operations.

Optional Parameters

gcsCredentials

string

Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.

maxThreads

integer

default:"10"

Maximum number of parallel upload threads.

gzipFiles

boolean

default:"false"

When true, compresses files with gzip before uploading. Changes extension to .dgz.

removeFiles

boolean

default:"true"

When true, deletes local files after successful upload to GCS.

Data Loader

The Data Loader reads data files from GCS and loads them into BigQuery tables using COPY INTO operations.

Required Parameters

dsnFile

string

required

Path to a file containing the BigQuery connection string.

Connection string format:

bigquery://PROJECT_ID/DATASET?gcs_bucket=BUCKET_NAME

Example:

bigquery://my-project/my_dataset?gcs_bucket=my-staging-bucket

schemaFile

string

required

Path to the Wirekite schema file used by Schema Loader. Required for table structure information.

logFile

string

required

Absolute path to the log file for Data Loader operations.

Optional Parameters

gcsCredentials

string

Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.

maxThreads

integer

default:"5"

Maximum number of parallel threads for loading tables.

hexEncoding

boolean

default:"false"

Set to true if data was extracted using hex encoding instead of base64.

location

string

BigQuery dataset location (e.g., “US”, “EU”). Only needed if non-default.

The Data Loader creates temporary staging tables with auto-expiration for intermediate processing.

Change Loader

The Change Loader applies ongoing data changes (INSERT, UPDATE, DELETE) to BigQuery tables using MERGE operations with shadow tables.

Required Parameters

dsnFile

string

required

Path to a file containing the BigQuery connection string.

Connection string format:

bigquery://PROJECT_ID/DATASET?gcs_bucket=BUCKET_NAME

inputDirectory

string

required

Directory containing change files (.ckt) from the Change Extractor.

workDirectory

string

required

Working directory for temporary CSV files during merge operations. Must be writable.

schemaFile

string

required

Path to the Wirekite schema file for table structure information.

logFile

string

required

Absolute path to the log file for Change Loader operations.

Optional Parameters

gcsCredentials

string

Path to GCS service account credentials JSON file. Uses Application Default Credentials if not specified.

maxFilesPerBatch

integer

default:"30"

Maximum number of change files to process in a single batch.

hexEncoding

boolean

default:"false"

Set to true if change data was extracted using hex encoding.

location

string

BigQuery dataset location (e.g., “US”, “EU”). Only needed if non-default.

The Change Loader should not start until the Data Loader has successfully completed the initial full load.

Orchestrator Configuration

When using the Wirekite Orchestrator, prefix parameters with mover., target.schema., target.data., or target.change.. Example orchestrator configuration for BigQuery target:

# Main configuration
source=postgres
target=bigquery

# Data mover (GCS)
mover.gcsBucket=my-staging-bucket
mover.dataDirectory=/opt/wirekite/output/data
mover.logFile=/var/log/wirekite/data-mover.log
mover.maxThreads=10
mover.removeFiles=true

# Schema loading
target.schema.schemaFile=/opt/wirekite/output/schema/wirekite_schema.skt
target.schema.createTableFile=/opt/wirekite/output/schema/create_tables.sql
target.schema.createConstraintFile=/opt/wirekite/output/schema/constraints.sql
target.schema.createForeignKeyFile=/opt/wirekite/output/schema/foreign_keys.sql
target.schema.logFile=/var/log/wirekite/schema-loader.log

# Data loading
target.data.dsnFile=/opt/wirekite/config/bigquery.dsn
target.data.schemaFile=/opt/wirekite/output/schema/wirekite_schema.skt
target.data.logFile=/var/log/wirekite/data-loader.log
target.data.maxThreads=8

# Change loading (CDC)
target.change.dsnFile=/opt/wirekite/config/bigquery.dsn
target.change.inputDirectory=/opt/wirekite/output/changes
target.change.workDirectory=/opt/wirekite/work
target.change.schemaFile=/opt/wirekite/output/schema/wirekite_schema.skt
target.change.logFile=/var/log/wirekite/change-loader.log
target.change.maxFilesPerBatch=30

For complete Orchestrator documentation, see the Execution Guide.

Introduction

Datatype Matrices

Source Guides

Data Movers

Target Guides

Running Wirekite

Wiretalk

Overview

Prerequisites

Google Cloud Configuration

Storage Requirements

Schema Loader

Required Parameters

Optional Parameters

Data Mover (GCS)

Required Parameters

Optional Parameters

Data Loader

Required Parameters

Optional Parameters

Change Loader

Required Parameters

Optional Parameters

Orchestrator Configuration

Introduction

Datatype Matrices

Source Guides

Data Movers

Target Guides

Running Wirekite

Wiretalk

​Overview

​Prerequisites

​Google Cloud Configuration

​Storage Requirements

​Schema Loader

​Required Parameters

​Optional Parameters

​Data Mover (GCS)

​Required Parameters

​Optional Parameters

​Data Loader

​Required Parameters

​Optional Parameters

​Change Loader

​Required Parameters

​Optional Parameters

​Orchestrator Configuration

Overview

Prerequisites

Google Cloud Configuration

Storage Requirements

Schema Loader

Required Parameters

Optional Parameters

Data Mover (GCS)

Required Parameters

Optional Parameters

Data Loader

Required Parameters

Optional Parameters

Change Loader

Required Parameters

Optional Parameters

Orchestrator Configuration