When to Use
Use the GCS Data Mover when:- Your target database loads data from Google Cloud Storage
- You’re using BigQuery or Spanner as a target
- Your extracted data needs to be staged in GCS before loading
Features
- Multi-threaded uploads - Parallel uploads for better throughput
- Optional gzip compression - Compress files before upload to reduce transfer time and storage costs
- Crash recovery - Skips files that already exist in GCS, allowing safe restarts
- Automatic cleanup - Removes local files after successful upload
- Coordinated completion - Waits for
DATA.DONEsignal before finalizing
Configuration
The GCS Data Mover uses a configuration file passed as a command-line argument:Configuration Parameters
The name of the GCS bucket where files will be uploaded.
The full path on the local machine where the Wirekite Data Extractor writes its files.
The path to a file where the GCS Data Mover will write its logging output.
Path to a GCP service account JSON credentials file. If not provided, the mover uses Application Default Credentials (ADC).
Maximum number of parallel threads for uploading files. Default is
10.Whether to gzip files before uploading. Compressed files use the
.dgz extension. Default is false.Whether to remove local files after successful upload. Default is
true.Example Configuration
GCS Credentials
The GCS Data Mover supports two authentication methods:- Service Account JSON - Provide the path to a service account key file via
gcsCredentials - Application Default Credentials - If no credentials file is specified, uses ADC (useful when running on GCP infrastructure)
storage.objects.createstorage.objects.get
Orchestrator Integration
When using the GCS Data Mover with the Wirekite Orchestrator, add the mover configuration to your orchestrator config:How It Works
- The mover monitors
dataDirectoryfor files with the specified extension - For each file found:
- Checks if the file already exists in GCS (crash recovery)
- Optionally compresses the file with gzip
- Uploads to the GCS bucket
- Removes the local file (if configured)
- When
DATA.DONEappears in the directory, uploads it and exits gracefully
Notes
- Ensure proper IAM permissions are configured for the service account
- The mover creates files with their original names in the bucket root
- For large migrations, increase
maxThreadsto improve throughput - Enable
gzipFilesto reduce network transfer time and GCS storage costs - When running on GCP VMs, consider using Application Default Credentials for simpler configuration
