Overview
Google Pub/Sub is supported as a queue target alongside Kafka and
Redpanda. The Wirekite extractor publishes one message per .ckt
change record (or one per batch when queueBatchSize > 1) using the
official cloud.google.com/go/pubsub SDK on the Go-side extractors and
google-cloud-cpp on the SQL Server C++ extractor.
For cross-vendor concepts (message format, ordering, retries, charts),
see Queues Overview.
When to choose Pub/Sub
Pick Pub/Sub when:
- The deployment already runs in Google Cloud and Pub/Sub fits the
existing pipeline (Dataflow, BigQuery streaming inserts, downstream
services that already subscribe to Pub/Sub topics).
- You don’t want to operate a Kafka/Redpanda cluster.
- Per-table delivery order is required (Pub/Sub preserves it via
ordering keys; see “Ordering and exactly-once” below).
Pick Kafka or Redpanda when:
- Latency is dominant and you control the broker hardware.
- Per-message cost is unacceptable (Pub/Sub Standard is metered; Kafka
is hardware-cost only — see “Cost” below).
- The downstream consumer is a Kafka-native system.
One-time GCP setup
1. Create the topic
gcloud pubsub topics create <topic-id> --project=<project-id>
Pub/Sub topic names are flat (no folders). One topic per migration is
the natural unit. Suggested name shape:
<env>-wirekite-<source>-to-<target>.
2. Create the subscription
The subscription is what the downstream consumer reads from. Two
flags must be set at creation time — neither can be flipped later
on an existing subscription:
gcloud pubsub subscriptions create <sub-id> \
--topic=<topic-id> \
--enable-message-ordering \
--enable-exactly-once-delivery \
--project=<project-id>
| Flag | Why |
|---|
--enable-message-ordering | Wirekite uses schema.table as the ordering key on every published message. Without this flag, the broker still accepts the ordering keys but does not enforce per-key delivery order — a downstream consumer sees rows interleaved out of commit order. |
--enable-exactly-once-delivery | Default Pub/Sub behavior is at-least-once: the broker may redeliver a message even after the subscriber acked it. Wirekite’s reference consumer and the production change-loader pipeline both assume one-shot delivery — duplicates cause double-applied changes downstream. Exactly-once shifts the broker to deduplicating delivery within an ack window. |
Both flags are unflippable once the subscription is created. If you
need to change either, delete and recreate the subscription. The
extractor checks both flags at startup (when running the reference
consumer) and refuses to start unless they are both true.
Recommended subscription settings:
ack-deadline = 60s (default 10s — increase if your consumer
takes longer than 10s to process a single batched message).
message-retention-duration = 1d (sufficient for catching up after
a short consumer outage; longer retention raises storage cost).
3. Service-account credentials
Create a service account dedicated to the Wirekite migration:
gcloud iam service-accounts create wirekite-pubsub --project=<project-id>
gcloud projects add-iam-policy-binding <project-id> \
--member="serviceAccount:wirekite-pubsub@<project-id>.iam.gserviceaccount.com" \
--role="roles/pubsub.publisher"
gcloud projects add-iam-policy-binding <project-id> \
--member="serviceAccount:wirekite-pubsub@<project-id>.iam.gserviceaccount.com" \
--role="roles/pubsub.viewer"
Both roles are required. pubsub.publisher alone does not include
pubsub.topics.get, which Wirekite calls at startup to confirm the
topic exists before any publishes are accepted. Without
pubsub.viewer, the extractor exits at startup with pubsub topic … exists check: permission denied.
If you also intend to run Wirekite’s reference consumer
(pubsub_reader) or its drain helper (pubsub_admin -seek-now)
against the same subscription, also add:
--role="roles/pubsub.subscriber" # for pubsub_reader pulls
--role="roles/pubsub.editor" # for pubsub_admin SeekToTime
Download a JSON key for the service account and store it where the
Wirekite extractor host can read it. The path you supply to Wirekite is
encrypted in place at queue-target creation time, so the plaintext key
does not have to remain on disk afterward.
Wirekite configuration
Queue-target definition
Create the queue target through the UX wizard or the API:
| Field | Required | Description |
|---|
type | yes | pubsub |
project | yes | GCP project id holding the topic |
topic | yes | Pub/Sub topic id (NOT the full projects/<id>/topics/<name> form) |
credentials_file | yes | Absolute path to the service-account JSON; encrypted in place at target-creation time |
The kafka-only fields (brokers, compression) are rejected on
Pub/Sub queue targets — gRPC handles transport compression
transparently, and Pub/Sub has no “broker list” concept. Setting
either field surfaces the misconfiguration rather than silently
ignoring it.
When wiring the extractor directly, the orchestrator emits:
queueType=pubsub
queueTopic=<topic-id>
queueProject=<project-id>
queueCredentialsFile=<absolute-path-to-encrypted-json>
queueBatchSize=100 # optional
The same key shape applies to every source database (mysql / postgres /
oracle Go extractors and the SQL Server change_extractor_pubsub C++
binary).
Ordering and exactly-once
Ordering key. Wirekite sets OrderingKey = "<schema>.<table>" on
every published message. Per-table change order matches the source’s
commit order end-to-end as long as the subscription has
enableMessageOrdering=true.
Per-table batching. When queueBatchSize > 1, lines are accumulated
in independent per-key buffers — one buffer per schema.table. A
single Pub/Sub message therefore never contains rows from two
different tables (each message carries exactly one ordering key).
Exactly-once delivery. Wirekite’s reference consumer relies on
enableExactlyOnceDelivery=true. Without it, transient ack failures
cause the broker to redeliver messages already processed; the change
loader will reapply changes and validation runs will report spurious
diffs.
Per-key throughput ceiling. Pub/Sub bounds throughput per ordering
key at roughly 1 MB/s. For a single high-write table whose change
volume exceeds that, split it at the source level (separate migrations
or separate topics) — a single Pub/Sub topic with one ordering key per
table is not the right shape for >1 MB/s on one table.
Failure semantics
The Wirekite Pub/Sub publisher is fail-fast, with two distinct
detection paths:
-
Explicit error. When the SDK’s
PublishResult.Get() returns an
error (after gax exponential backoff is exhausted on
retryable codes — Unavailable, DeadlineExceeded,
ResourceExhausted), Wirekite increments errors, logs
FATAL: QUEUE delivery failed, and exits.
-
Stall watchdog. A 5-second-cadence watchdog tracks whether the
delivered counter is advancing. If it has been frozen for 30
seconds while there are unresolved publishes (inflight > 0),
Wirekite logs FATAL: QUEUE stall detected — inflight=N, delivered held at M for Ks with no progress and exits. This catches the
pathological case where the SDK’s per-message future never resolves
— an observed-once failure mode that the SDK does not surface as an
error.
In both cases the orchestrator detects the non-zero exit, flips the
migration to Replication Failed, and surfaces the failure in the
UX. Recovery is Resume from Failed, which rewinds to the last
commit boundary persisted to the progress DB and re-emits from there.
The Pub/Sub publisher uses a fixed pool of 8 completion-worker
goroutines reading off a buffered channel — this bounds goroutine
count regardless of publish rate and provides natural back-pressure
when broker acks lag. The completion workers do not retry; they
trust the SDK’s internal retry layer.
Cost
Pub/Sub Standard pricing (US, as of 2026-05): $40 / TB of message
throughput, plus $0.27/GB egress for cross-region consumers, plus
storage if subscription retention exceeds 7 days.
Concrete sizing:
| Throughput | Avg msg size | $/day | $/month |
|---|
| 1k changes/s | 256 B | <$1 | ~$10 |
| 50k changes/s | 256 B | ~$15 | ~$450 |
| 334k changes/s (benchmark) | 256 B | ~$450 | ~$13,500 |
Notes:
- “Changes/s” is per source, not per topic. A single topic carries
every change from one migration.
- Application-level batching (
queueBatchSize=100) does NOT reduce
message-throughput billing — Pub/Sub bills on byte volume, not RPC
count. Batching reduces RPC count and SDK overhead, not data volume.
- For benchmark-scale runs, prefer Kafka/Redpanda on customer-managed
hardware. Pub/Sub Standard is cost-appropriate for production-rate
workloads, not synthetic load tests.
Pub/Sub Lite (capacity-based, cheaper) is not supported. Google
announced Pub/Sub Lite deprecation in 2024 with end-of-life on
2026-06-30; Wirekite does not target a sunsetting product.
Operations
Drain a subscription
The reference helper qa/queue_reader/pubsub_admin -seek-now moves the
subscription’s read cursor to the current time so any messages
published before that point are treated as already-acked. Used by
qa/fulltest between iterations; usable ad-hoc:
pubsub_admin -project <project-id> -subscription <sub-id> \
-credentials <path-to-json> -seek-now
Reference consumer
pubsub_reader -project <project-id> -subscription <sub-id> \
-credentials <path-to-json> -outdir <demux-dir>
Subscribes to the subscription, demultiplexes incoming messages by the
OP\tschema\ttable\t... prefix into per-table .ckt files in
<demux-dir>. Refuses to start unless the subscription has both
enableMessageOrdering=true and enableExactlyOnceDelivery=true.
Validate publish counts
The extractor and pubsub_reader print final counters in the same
shape:
QUEUE: shutdown (pubsub, submitted=N, delivered=N, errors=0)
submitted and delivered should match exactly when the run completes
cleanly. Any non-zero errors indicates a publish-side failure that
will already have been logged earlier as a FATAL line and triggered
extractor exit.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|
pubsub topic <T> does not exist in project <P> at startup | Topic id misspelled or in a different project | Verify with gcloud pubsub topics describe <topic> --project=<project> |
pubsub topic … exists check: permission denied at startup | Service account missing roles/pubsub.viewer | Add the role per “Service-account credentials” above |
pubsub_reader receives messages out of publish order within a table | Subscription created without --enable-message-ordering | Recreate the subscription — the flag cannot be toggled |
pubsub_reader receives the same message multiple times | Subscription created without --enable-exactly-once-delivery | Recreate the subscription — the flag cannot be toggled |
Validator diff shows extra rows in queue dump that aren’t in .ckt files | Same as above (duplicates) | Recreate the subscription with exactly-once enabled |
FATAL: QUEUE stall detected during a run | Pub/Sub publisher future never resolved | Resume from Failed; check Pub/Sub quotas, IAM, and service health |
| Sustained billing spike | Benchmark/load-test workload running against Pub/Sub Standard | Move benchmark workloads to Kafka/Redpanda; reserve Pub/Sub for production-rate traffic |
For Kafka/Redpanda-specific setup, see Kafka and Redpanda.