Cassandra - Wirekite Docs

Cassandra is supported as a source for Target Sync only — there is no CDC path. See the Cassandra Source Guide for configuration.

Schema Mapping

Wirekite projects a CQL keyspace onto a fundamentally relational target. The structural mapping is:

CQL concept	Target concept
keyspace	schema (renamable via `schemaRename`)
table	table
partition key + clustering columns	composite `PRIMARY KEY` (in declaration order)
regular column	nullable column
static column	regular column (the static-ness is not modeled)
`frozen<…>`	transparent — the inner type is used

Canonical Type Mapping

Wirekite first converts each CQL type to an internal Wirekite type, then each target’s loader maps that to a concrete column type. The source of truth is cassandra/db_utils.CQLToWirekite.

CQL type	Wirekite type	Notes
`text`, `varchar`, `ascii`	`VARCHAR`	length synthesized to 4000 — see Synthesized Bounds
`tinyint`	`INT8`
`smallint`	`INT16`
`int`	`INT32`
`bigint`	`INT64`
`counter`	`INT64`	read as its current value
`varint`	`NUMERIC`	arbitrary-precision integer — precision synthesized to `(38,0)`
`decimal`	`NUMERIC`	precision/scale synthesized to `(38,18)`
`float`	`FLOAT32`
`double`	`FLOAT64`
`boolean`	`BOOL`
`timestamp`	`TIMESTAMP`	millisecond precision, UTC
`date`	`DATE`
`time`	`TIME`	nanosecond-since-midnight rendered `HH:MM:SS.ffffff`
`uuid`, `timeuuid`	`UUID`	rendered lowercase for stable ordering
`inet`	`VARCHAR`	IP rendered as text
`duration`	`VARCHAR`	ISO-8601-ish text
`blob`	`BLOB`
`list<…>`, `set<…>`, `map<…>`, `tuple<…>`	`JSON`	serialized to JSON — see how collections are encoded
user-defined type (UDT), any unknown	`JSON`	serialized to JSON

Mapping to Targets

The following tables show how each CQL type lands on every supported Target Sync target.

Integer Types

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`tinyint`	`smallint`	`TINYINT`	`SMALLINT`	`NUMBER(38)`	`INTEGER`	`tinyint`	`INT64`	`INT64`	`TINYINT`	`TINYINT`
`smallint`	`smallint`	`SMALLINT`	`SMALLINT`	`NUMBER(38)`	`INTEGER`	`number`	`INT64`	`INT64`	`SMALLINT`	`SMALLINT`
`int`	`integer`	`INT`	`INT`	`NUMBER(38)`	`INTEGER`	`int`	`INT64`	`INT64`	`INT`	`INT`
`bigint`	`bigint`	`BIGINT`	`BIGINT`	`NUMBER(38)`	`BIGINT`	`bigint`	`INT64`	`INT64`	`BIGINT`	`BIGINT`
`counter`	`bigint`	`BIGINT`	`BIGINT`	`NUMBER(38)`	`BIGINT`	`bigint`	`INT64`	`INT64`	`BIGINT`	`BIGINT`

Decimal Types

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`varint`	`numeric(38,0)`	`NUMERIC(38,0)`	`NUMERIC(38,0)`	`NUMBER(38,0)`	`NUMERIC(38,0)`	`number(38,0)`	`NUMERIC`	`NUMERIC`	`NUMERIC(38,0)`	`DECIMAL(38,0)`
`decimal`	`numeric(38,18)`	`NUMERIC(38,18)`	`NUMERIC(38,18)`	`NUMBER(38,18)`	`NUMERIC(38,18)`	`number(38,18)`	`NUMERIC`	`NUMERIC`	`NUMERIC(38,18)`	`DECIMAL(38,18)`

Spanner and BigQuery use a fixed-scale NUMERIC. Spanner NUMERIC is precision/scale (38,9) and BigQuery NUMERIC caps at (38,9). A Cassandra decimal carries up to scale 18, so on these two targets the scale is clamped to 9. The validator compares decimal values numerically (not as strings), so padding/clamping on the other targets does not produce false diffs.

Floating Point Types

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`float`	`real`	`FLOAT`	`REAL`	`BINARY_FLOAT`	`DOUBLE`	`float`	`FLOAT32`	`FLOAT64`	`FLOAT`	`FLOAT`
`double`	`double precision`	`DOUBLE`	`FLOAT`	`BINARY_DOUBLE`	`DOUBLE`	`float`	`FLOAT64`	`FLOAT64`	`DOUBLE`	`DOUBLE`

Boolean Type

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`boolean`	`boolean`	`TINYINT(1)`	`BIT`	`CHAR(1)`	`BOOLEAN`	`boolean`	`BOOL`	`BOOL`	`TINYINT(1)`	`BOOLEAN`

Date and Time Types

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`timestamp`	`timestamp without time zone`	`DATETIME(6)`	`DATETIME2`	`TIMESTAMP`	`TIMESTAMP`	`timestamp_ntz`	`TIMESTAMP`	`DATETIME`	`DATETIME(6)`	`TIMESTAMP_NTZ`
`date`	`date`	`DATE`	`DATE`	`DATE`	`DATE`	`date`	`DATE`	`DATE`	`DATE`	`DATE`
`time`	`time without time zone`	`TIME(6)`	`TIME`	`VARCHAR2(64)`	`TEXT`	`time`	`STRING`	`TIME`	`TIME(6)`	`STRING`

Character, Network & UUID Types

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`text` / `varchar` / `ascii`	`character varying(4000)`	`VARCHAR(4000)`	`VARCHAR(4000)`	`VARCHAR2(4000)`	`TEXT`	`varchar(4000)`	`STRING(4000)`	`STRING`	`VARCHAR(4000)`	`STRING`
`inet`	`character varying(4000)`	`VARCHAR(4000)`	`VARCHAR(4000)`	`VARCHAR2(4000)`	`TEXT`	`varchar(4000)`	`STRING(4000)`	`STRING`	`VARCHAR(4000)`	`STRING`
`duration`	`character varying(4000)`	`VARCHAR(4000)`	`VARCHAR(4000)`	`VARCHAR2(4000)`	`TEXT`	`varchar(4000)`	`STRING(4000)`	`STRING`	`VARCHAR(4000)`	`STRING`
`uuid` / `timeuuid`	`uuid`	`VARCHAR(64)`	`UNIQUEIDENTIFIER`	`VARCHAR2(128)`	`TEXT`	`varchar(64)`	`STRING`	`STRING`	`VARCHAR(64)`	`STRING`

Binary Type

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`blob`	`bytea`	`BLOB`	`VARBINARY(MAX)`	`BLOB`	`BYTEA`	`binary`	`BYTES`	`BYTES`	`BLOB`	`BINARY`

Collection & Complex Types

Cassandra	PostgreSQL	MySQL	SQL Server	Oracle	Firebolt	Snowflake	Spanner	BigQuery	SingleStore	Databricks
`list` / `set` / `map` / `tuple` / UDT	`jsonb`	`JSON`	`VARCHAR(MAX)`	`CLOB`	`TEXT`	`varchar`	`JSON`	`JSON`	`JSON`	`STRING`

MariaDB maps identically to MySQL, and TigerData identically to PostgreSQL. MongoDB is a schemaless target — scalar types land as native BSON values and JSON columns become sub-documents, so it is not shown as a column above.

Synthesized Bounds

Cassandra has no column lengths, precisions, or scales — text is unbounded and decimal/varint are arbitrary-precision. Relational targets, however, want bounded, precise column types, and some bulk loaders reject unbounded ones outright. So the schema extractor synthesizes sensible bounds:

varint → NUMERIC(38,0) and decimal → NUMERIC(38,18). Without an explicit precision, Databricks defaults NUMERIC to DECIMAL(10,0) and overflows on a 19-digit varint. 38 is the maximum Databricks/Oracle decimal precision; scale 0 keeps varint an exact integer, and scale 18 leaves room for a fraction. Targets that prefer arbitrary precision (PostgreSQL, Firebolt) simply pad the scale.
text / varchar / ascii / inet / duration → length 4000. A length-less VARCHAR becomes VARCHAR(MAX) on SQL Server (whose BCP loader cannot bind it) and CLOB on Oracle (whose OCI loader cannot bulk-load it). 4000 is Oracle’s VARCHAR2 standard maximum and within SQL Server’s 8000 non-MAX cap.

The fixed 4000-character bound is lossy for genuinely large Cassandra text. It is ample for typical keys, labels, and short documents; tables that need larger text should raise this bound.

How Collections Are Encoded

list, set, map, tuple, and user-defined types are serialized to a single JSON column on the target. Wirekite does not explode collections into child tables, normalize maps, or model UDT fields — the relational skeleton (primary key + scalar columns) stays queryable, and the complex values ride along as JSON. To keep the diff stable across runs, the JSON is canonicalized:

list and tuple → JSON array, order preserved.
set → JSON array, sorted (sets are unordered in Cassandra).
map → JSON object with keys sorted and coerced to strings.
UDT → JSON object, fields in declared order.

Complex columns cannot be part of a primary key, and frozen<…> wrappers are transparent (the inner type is used).

​Schema Mapping

​Canonical Type Mapping

​Mapping to Targets

​Integer Types

​Decimal Types

​Floating Point Types

​Boolean Type

​Date and Time Types

​Character, Network & UUID Types

​Binary Type

​Collection & Complex Types

​Synthesized Bounds

​How Collections Are Encoded

Schema Mapping

Canonical Type Mapping

Mapping to Targets

Integer Types

Decimal Types

Floating Point Types

Boolean Type

Date and Time Types

Character, Network & UUID Types

Binary Type

Collection & Complex Types

Synthesized Bounds

How Collections Are Encoded