Skip to main content
The GCS Data Mover transfers files extracted by Wirekite Extractors to Google Cloud Storage (GCS) buckets. It is used when your target database loads data from GCS, such as BigQuery, Spanner, or other GCP-integrated targets. The mover watches a local directory for extracted data files, uploads them to the specified GCS bucket, and optionally compresses and removes local files after successful upload.

When to Use

Use the GCS Data Mover when:
  • Your target database loads data from Google Cloud Storage
  • You’re using BigQuery or Spanner as a target
  • Your extracted data needs to be staged in GCS before loading

Features

  • Multi-threaded uploads - Parallel uploads for better throughput
  • Optional gzip compression - Compress files before upload to reduce transfer time and storage costs
  • Crash recovery - Skips files that already exist in GCS, allowing safe restarts
  • Automatic cleanup - Removes local files after successful upload
  • Coordinated completion - Waits for DATA.DONE signal before finalizing

Configuration

The GCS Data Mover uses a configuration file passed as a command-line argument:
gcs_data_mover <config_file>

Configuration Parameters

gcsBucket
string
required
The name of the GCS bucket where files will be uploaded.
dataDirectory
string
required
The full path on the local machine where the Wirekite Data Extractor writes its files.
logFile
string
required
The path to a file where the GCS Data Mover will write its logging output.
gcsCredentials
string
Path to a GCP service account JSON credentials file. If not provided, the mover uses Application Default Credentials (ADC).
maxThreads
integer
Maximum number of parallel threads for uploading files. Default is 10.
gzipFiles
boolean
Whether to gzip files before uploading. Compressed files use the .dgz extension. Default is false.
removeFiles
boolean
Whether to remove local files after successful upload. Default is true.

Example Configuration

gcsBucket = my-wirekite-staging
dataDirectory = /data/wirekite/extract
logFile = /var/log/wirekite/gcs_mover.log
gcsCredentials = /etc/wirekite/gcs-service-account.json
maxThreads = 20
gzipFiles = true
removeFiles = true

GCS Credentials

The GCS Data Mover supports two authentication methods:
  1. Service Account JSON - Provide the path to a service account key file via gcsCredentials
  2. Application Default Credentials - If no credentials file is specified, uses ADC (useful when running on GCP infrastructure)
The service account needs the following permissions on the target bucket:
  • storage.objects.create
  • storage.objects.get

Orchestrator Integration

When using the GCS Data Mover with the Wirekite Orchestrator, add the mover configuration to your orchestrator config:
moverType = gcs
moverConfig = /path/to/gcs_mover.cfg

How It Works

  1. The mover monitors dataDirectory for files with the specified extension
  2. For each file found:
    • Checks if the file already exists in GCS (crash recovery)
    • Optionally compresses the file with gzip
    • Uploads to the GCS bucket
    • Removes the local file (if configured)
  3. When DATA.DONE appears in the directory, uploads it and exits gracefully

Notes

  • Ensure proper IAM permissions are configured for the service account
  • The mover creates files with their original names in the bucket root
  • For large migrations, increase maxThreads to improve throughput
  • Enable gzipFiles to reduce network transfer time and GCS storage costs
  • When running on GCP VMs, consider using Application Default Credentials for simpler configuration