Skip to main content
The AWS Data Mover transfers files extracted by Wirekite Extractors to AWS S3 buckets. It is used when your target database loads data from S3, such as Firebolt or other AWS-integrated targets. The mover watches a local directory for extracted data files, uploads them to the specified S3 bucket, and optionally compresses and removes local files after successful upload.

When to Use

Use the AWS Data Mover when:
  • Your target database loads data from AWS S3
  • You’re using Firebolt as a target
  • Your extracted data needs to be staged in S3 before loading

Features

  • Multi-threaded uploads - Parallel uploads for better throughput
  • Optional gzip compression - Compress files before upload to reduce transfer time and storage costs
  • Crash recovery - Skips files that already exist in S3, allowing safe restarts
  • Automatic cleanup - Removes local files after successful upload
  • Coordinated completion - Waits for DATA.DONE signal before finalizing

Configuration

The AWS Data Mover uses a configuration file passed as a command-line argument:
aws_data_mover <config_file>

Configuration Parameters

awsRegion
string
required
The AWS region where your S3 bucket is located (e.g., us-east-1, eu-west-1).
awsBucket
string
required
The name of the S3 bucket where files will be uploaded.
dataDirectory
string
required
The full path on the local machine where the Wirekite Data Extractor writes its files.
logFile
string
required
The path to a file where the AWS Data Mover will write its logging output.
awsCredentials
string
Path to a file containing AWS credentials in the format access_key:secret_key. If not provided, the mover uses the default AWS credential chain (environment variables, IAM role, etc.).
maxThreads
integer
Maximum number of parallel threads for uploading files. Default is 10.
gzipFiles
boolean
Whether to gzip files before uploading. Compressed files use the .dgz extension. Default is false.
removeFiles
boolean
Whether to remove local files after successful upload. Default is true.

Example Configuration

awsRegion = us-east-1
awsBucket = my-wirekite-staging
dataDirectory = /data/wirekite/extract
logFile = /var/log/wirekite/aws_mover.log
awsCredentials = /etc/wirekite/aws_creds.txt
maxThreads = 20
gzipFiles = true
removeFiles = true

AWS Credentials File Format

If using the awsCredentials option, the file should contain:
ACCESS_KEY_ID:SECRET_ACCESS_KEY

Orchestrator Integration

When using the AWS Data Mover with the Wirekite Orchestrator, add the mover configuration to your orchestrator config:
moverType = aws
moverConfig = /path/to/aws_mover.cfg

How It Works

  1. The mover monitors dataDirectory for files with the specified extension
  2. For each file found:
    • Checks if the file already exists in S3 (crash recovery)
    • Optionally compresses the file with gzip
    • Uploads to the S3 bucket
    • Removes the local file (if configured)
  3. When DATA.DONE appears in the directory, uploads it and exits gracefully

Notes

  • Ensure the AWS credentials have s3:PutObject and s3:GetObject permissions on the target bucket
  • The mover creates files with their original names in the bucket root
  • For large migrations, increase maxThreads to improve throughput
  • Enable gzipFiles to reduce network transfer time and S3 storage costs