> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wirekite.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Pub/Sub

> How to configure a Google Pub/Sub queue target — GCP setup, IAM roles, ordering, exactly-once delivery, and cost guidance.

## Overview

Google Pub/Sub is supported as a queue target alongside Kafka and
Redpanda. The Wirekite extractor publishes one message per `.ckt`
change record (or one per batch when `queueBatchSize > 1`) using the
official `cloud.google.com/go/pubsub` SDK on the Go-side extractors and
`google-cloud-cpp` on the SQL Server C++ extractor.

For cross-vendor concepts (message format, ordering, retries, charts),
see [Queues Overview](/queues/overview).

## When to choose Pub/Sub

Pick Pub/Sub when:

* The deployment already runs in Google Cloud and Pub/Sub fits the
  existing pipeline (Dataflow, BigQuery streaming inserts, downstream
  services that already subscribe to Pub/Sub topics).
* You don't want to operate a Kafka/Redpanda cluster.
* Per-table delivery order is required (Pub/Sub preserves it via
  ordering keys; see "Ordering and exactly-once" below).

Pick Kafka or Redpanda when:

* Latency is dominant and you control the broker hardware.
* Per-message cost is unacceptable (Pub/Sub Standard is metered; Kafka
  is hardware-cost only — see "Cost" below).
* The downstream consumer is a Kafka-native system.

## One-time GCP setup

### 1. Create the topic

```
gcloud pubsub topics create <topic-id> --project=<project-id>
```

Pub/Sub topic names are flat (no folders). One topic per migration is
the natural unit. Suggested name shape:
`<env>-wirekite-<source>-to-<target>`.

### 2. Create the subscription

The subscription is what the downstream consumer reads from. **Two
flags must be set at creation time** — neither can be flipped later
on an existing subscription:

```
gcloud pubsub subscriptions create <sub-id> \
    --topic=<topic-id> \
    --enable-message-ordering \
    --enable-exactly-once-delivery \
    --project=<project-id>
```

| Flag                             | Why                                                                                                                                                                                                                                                                                                                                                          |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `--enable-message-ordering`      | Wirekite uses `schema.table` as the ordering key on every published message. Without this flag, the broker still accepts the ordering keys but does not enforce per-key delivery order — a downstream consumer sees rows interleaved out of commit order.                                                                                                    |
| `--enable-exactly-once-delivery` | Default Pub/Sub behavior is at-least-once: the broker may redeliver a message even after the subscriber acked it. Wirekite's reference consumer and the production change-loader pipeline both assume one-shot delivery — duplicates cause double-applied changes downstream. Exactly-once shifts the broker to deduplicating delivery within an ack window. |

<Warning>
  Both flags are unflippable once the subscription is created. If you
  need to change either, delete and recreate the subscription. The
  extractor checks both flags at startup (when running the reference
  consumer) and refuses to start unless they are both `true`.
</Warning>

**Recommended subscription settings:**

* `ack-deadline = 60s` (default `10s` — increase if your consumer
  takes longer than 10s to process a single batched message).
* `message-retention-duration = 1d` (sufficient for catching up after
  a short consumer outage; longer retention raises storage cost).

### 3. Service-account credentials

Create a service account dedicated to the Wirekite migration:

```
gcloud iam service-accounts create wirekite-pubsub --project=<project-id>

gcloud projects add-iam-policy-binding <project-id> \
    --member="serviceAccount:wirekite-pubsub@<project-id>.iam.gserviceaccount.com" \
    --role="roles/pubsub.publisher"

gcloud projects add-iam-policy-binding <project-id> \
    --member="serviceAccount:wirekite-pubsub@<project-id>.iam.gserviceaccount.com" \
    --role="roles/pubsub.viewer"
```

<Note>
  Both roles are required. `pubsub.publisher` alone does **not** include
  `pubsub.topics.get`, which Wirekite calls at startup to confirm the
  topic exists before any publishes are accepted. Without
  `pubsub.viewer`, the extractor exits at startup with `pubsub topic …
    exists check: permission denied`.
</Note>

If you also intend to run Wirekite's reference consumer
(`pubsub_reader`) or its drain helper (`pubsub_admin -seek-now`)
against the same subscription, also add:

```
    --role="roles/pubsub.subscriber"   # for pubsub_reader pulls
    --role="roles/pubsub.editor"       # for pubsub_admin SeekToTime
```

Download a JSON key for the service account and store it where the
Wirekite extractor host can read it. The path you supply to Wirekite is
encrypted in place at queue-target creation time, so the plaintext key
does not have to remain on disk afterward.

## Wirekite configuration

### Queue-target definition

Create the queue target through the UX wizard or the API:

| Field              | Required | Description                                                                           |
| ------------------ | -------- | ------------------------------------------------------------------------------------- |
| `type`             | yes      | `pubsub`                                                                              |
| `project`          | yes      | GCP project id holding the topic                                                      |
| `topic`            | yes      | Pub/Sub topic id (NOT the full `projects/<id>/topics/<name>` form)                    |
| `credentials_file` | yes      | Absolute path to the service-account JSON; encrypted in place at target-creation time |

<Note>
  The kafka-only fields (`brokers`, `compression`) are rejected on
  Pub/Sub queue targets — gRPC handles transport compression
  transparently, and Pub/Sub has no "broker list" concept. Setting
  either field surfaces the misconfiguration rather than silently
  ignoring it.
</Note>

### Extractor config keys

When wiring the extractor directly, the orchestrator emits:

```
queueType=pubsub
queueTopic=<topic-id>
queueProject=<project-id>
queueCredentialsFile=<absolute-path-to-encrypted-json>
queueBatchSize=100        # optional
```

The same key shape applies to every source database (mysql / postgres /
oracle Go extractors and the SQL Server `change_extractor_pubsub` C++
binary).

## Ordering and exactly-once

**Ordering key.** Wirekite sets `OrderingKey = "<schema>.<table>"` on
every published message. Per-table change order matches the source's
commit order end-to-end **as long as** the subscription has
`enableMessageOrdering=true`.

**Per-table batching.** When `queueBatchSize > 1`, lines are accumulated
in independent per-key buffers — one buffer per `schema.table`. A
single Pub/Sub message therefore never contains rows from two
different tables (each message carries exactly one ordering key).

**Exactly-once delivery.** Wirekite's reference consumer relies on
`enableExactlyOnceDelivery=true`. Without it, transient ack failures
cause the broker to redeliver messages already processed; the change
loader will reapply changes and validation runs will report spurious
diffs.

**Per-key throughput ceiling.** Pub/Sub bounds throughput per ordering
key at roughly 1 MB/s. For a single high-write table whose change
volume exceeds that, split it at the source level (separate migrations
or separate topics) — a single Pub/Sub topic with one ordering key per
table is not the right shape for >1 MB/s on one table.

## Failure semantics

The Wirekite Pub/Sub publisher is **fail-fast**, with two distinct
detection paths:

1. **Explicit error.** When the SDK's `PublishResult.Get()` returns an
   error (after `gax` exponential backoff is exhausted on
   retryable codes — `Unavailable`, `DeadlineExceeded`,
   `ResourceExhausted`), Wirekite increments `errors`, logs
   `FATAL: QUEUE delivery failed`, and exits.

2. **Stall watchdog.** A 5-second-cadence watchdog tracks whether the
   `delivered` counter is advancing. If it has been frozen for **30
   seconds** while there are unresolved publishes (`inflight > 0`),
   Wirekite logs `FATAL: QUEUE stall detected — inflight=N, delivered
   held at M for Ks with no progress` and exits. This catches the
   pathological case where the SDK's per-message future never resolves
   — an observed-once failure mode that the SDK does not surface as an
   error.

In both cases the orchestrator detects the non-zero exit, flips the
migration to **Replication Failed**, and surfaces the failure in the
UX. Recovery is **Resume from Failed**, which rewinds to the last
commit boundary persisted to the progress DB and re-emits from there.

The Pub/Sub publisher uses a fixed pool of 8 completion-worker
goroutines reading off a buffered channel — this bounds goroutine
count regardless of publish rate and provides natural back-pressure
when broker acks lag. The completion workers do not retry; they
trust the SDK's internal retry layer.

## Cost

Pub/Sub Standard pricing (US, as of 2026-05): `$40 / TB` of message
throughput, plus `$0.27/GB` egress for cross-region consumers, plus
storage if subscription retention exceeds 7 days.

Concrete sizing:

| Throughput                 | Avg msg size | `$/day` | `$/month`  |
| -------------------------- | ------------ | ------- | ---------- |
| 1k changes/s               | 256 B        | `<$1`   | `~$10`     |
| 50k changes/s              | 256 B        | `~$15`  | `~$450`    |
| 334k changes/s (benchmark) | 256 B        | `~$450` | `~$13,500` |

Notes:

* "Changes/s" is per source, **not** per topic. A single topic carries
  every change from one migration.
* Application-level batching (`queueBatchSize=100`) does NOT reduce
  message-throughput billing — Pub/Sub bills on byte volume, not RPC
  count. Batching reduces RPC count and SDK overhead, not data volume.
* For benchmark-scale runs, prefer Kafka/Redpanda on customer-managed
  hardware. Pub/Sub Standard is cost-appropriate for production-rate
  workloads, not synthetic load tests.

<Warning>
  Pub/Sub Lite (capacity-based, cheaper) is **not supported**. Google
  announced Pub/Sub Lite deprecation in 2024 with end-of-life on
  2026-06-30; Wirekite does not target a sunsetting product.
</Warning>

## Operations

### Drain a subscription

The reference helper `qa/queue_reader/pubsub_admin -seek-now` moves the
subscription's read cursor to the current time so any messages
published before that point are treated as already-acked. Used by
`qa/fulltest` between iterations; usable ad-hoc:

```
pubsub_admin -project <project-id> -subscription <sub-id> \
    -credentials <path-to-json> -seek-now
```

### Reference consumer

```
pubsub_reader -project <project-id> -subscription <sub-id> \
    -credentials <path-to-json> -outdir <demux-dir>
```

Subscribes to the subscription, demultiplexes incoming messages by the
`OP\tschema\ttable\t...` prefix into per-table `.ckt` files in
`<demux-dir>`. Refuses to start unless the subscription has both
`enableMessageOrdering=true` and `enableExactlyOnceDelivery=true`.

### Validate publish counts

The extractor and `pubsub_reader` print final counters in the same
shape:

```
QUEUE: shutdown (pubsub, submitted=N, delivered=N, errors=0)
```

`submitted` and `delivered` should match exactly when the run completes
cleanly. Any non-zero `errors` indicates a publish-side failure that
will already have been logged earlier as a `FATAL` line and triggered
extractor exit.

## Troubleshooting

| Symptom                                                                   | Likely cause                                                  | Fix                                                                                     |
| ------------------------------------------------------------------------- | ------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| `pubsub topic <T> does not exist in project <P>` at startup               | Topic id misspelled or in a different project                 | Verify with `gcloud pubsub topics describe <topic> --project=<project>`                 |
| `pubsub topic … exists check: permission denied` at startup               | Service account missing `roles/pubsub.viewer`                 | Add the role per "Service-account credentials" above                                    |
| `pubsub_reader` receives messages out of publish order within a table     | Subscription created without `--enable-message-ordering`      | Recreate the subscription — the flag cannot be toggled                                  |
| `pubsub_reader` receives the same message multiple times                  | Subscription created without `--enable-exactly-once-delivery` | Recreate the subscription — the flag cannot be toggled                                  |
| Validator diff shows extra rows in queue dump that aren't in `.ckt` files | Same as above (duplicates)                                    | Recreate the subscription with exactly-once enabled                                     |
| `FATAL: QUEUE stall detected` during a run                                | Pub/Sub publisher future never resolved                       | Resume from Failed; check Pub/Sub quotas, IAM, and service health                       |
| Sustained billing spike                                                   | Benchmark/load-test workload running against Pub/Sub Standard | Move benchmark workloads to Kafka/Redpanda; reserve Pub/Sub for production-rate traffic |

For Kafka/Redpanda-specific setup, see [Kafka and Redpanda](/queues/kafka).
