Schema Mapping
Wirekite projects a CQL keyspace onto a fundamentally relational target. The structural mapping is:
| CQL concept | Target concept |
|---|
| keyspace | schema (renamable via schemaRename) |
| table | table |
| partition key + clustering columns | composite PRIMARY KEY (in declaration order) |
| regular column | nullable column |
| static column | regular column (the static-ness is not modeled) |
frozen<…> | transparent — the inner type is used |
Canonical Type Mapping
Wirekite first converts each CQL type to an internal Wirekite type, then each target’s loader maps that to a concrete column type. The source of truth is cassandra/db_utils.CQLToWirekite.
| CQL type | Wirekite type | Notes |
|---|
text, varchar, ascii | VARCHAR | length synthesized to 4000 — see Synthesized Bounds |
tinyint | INT8 | |
smallint | INT16 | |
int | INT32 | |
bigint | INT64 | |
counter | INT64 | read as its current value |
varint | NUMERIC | arbitrary-precision integer — precision synthesized to (38,0) |
decimal | NUMERIC | precision/scale synthesized to (38,18) |
float | FLOAT32 | |
double | FLOAT64 | |
boolean | BOOL | |
timestamp | TIMESTAMP | millisecond precision, UTC |
date | DATE | |
time | TIME | nanosecond-since-midnight rendered HH:MM:SS.ffffff |
uuid, timeuuid | UUID | rendered lowercase for stable ordering |
inet | VARCHAR | IP rendered as text |
duration | VARCHAR | ISO-8601-ish text |
blob | BLOB | |
list<…>, set<…>, map<…>, tuple<…> | JSON | serialized to JSON — see how collections are encoded |
| user-defined type (UDT), any unknown | JSON | serialized to JSON |
Mapping to Targets
The following tables show how each CQL type lands on every supported Target Sync target.
Integer Types
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
tinyint | smallint | TINYINT | SMALLINT | NUMBER(38) | INTEGER | tinyint | INT64 | INT64 | TINYINT | TINYINT |
smallint | smallint | SMALLINT | SMALLINT | NUMBER(38) | INTEGER | number | INT64 | INT64 | SMALLINT | SMALLINT |
int | integer | INT | INT | NUMBER(38) | INTEGER | int | INT64 | INT64 | INT | INT |
bigint | bigint | BIGINT | BIGINT | NUMBER(38) | BIGINT | bigint | INT64 | INT64 | BIGINT | BIGINT |
counter | bigint | BIGINT | BIGINT | NUMBER(38) | BIGINT | bigint | INT64 | INT64 | BIGINT | BIGINT |
Decimal Types
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
varint | numeric(38,0) | NUMERIC(38,0) | NUMERIC(38,0) | NUMBER(38,0) | NUMERIC(38,0) | number(38,0) | NUMERIC | NUMERIC | NUMERIC(38,0) | DECIMAL(38,0) |
decimal | numeric(38,18) | NUMERIC(38,18) | NUMERIC(38,18) | NUMBER(38,18) | NUMERIC(38,18) | number(38,18) | NUMERIC | NUMERIC | NUMERIC(38,18) | DECIMAL(38,18) |
Spanner and BigQuery use a fixed-scale NUMERIC. Spanner NUMERIC is precision/scale (38,9) and BigQuery NUMERIC caps at (38,9). A Cassandra decimal carries up to scale 18, so on these two targets the scale is clamped to 9. The validator compares decimal values numerically (not as strings), so padding/clamping on the other targets does not produce false diffs.
Floating Point Types
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
float | real | FLOAT | REAL | BINARY_FLOAT | DOUBLE | float | FLOAT32 | FLOAT64 | FLOAT | FLOAT |
double | double precision | DOUBLE | FLOAT | BINARY_DOUBLE | DOUBLE | float | FLOAT64 | FLOAT64 | DOUBLE | DOUBLE |
Boolean Type
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
boolean | boolean | TINYINT(1) | BIT | CHAR(1) | BOOLEAN | boolean | BOOL | BOOL | TINYINT(1) | BOOLEAN |
Date and Time Types
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
timestamp | timestamp without time zone | DATETIME(6) | DATETIME2 | TIMESTAMP | TIMESTAMP | timestamp_ntz | TIMESTAMP | DATETIME | DATETIME(6) | TIMESTAMP_NTZ |
date | date | DATE | DATE | DATE | DATE | date | DATE | DATE | DATE | DATE |
time | time without time zone | TIME(6) | TIME | VARCHAR2(64) | TEXT | time | STRING | TIME | TIME(6) | STRING |
Character, Network & UUID Types
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
text / varchar / ascii | character varying(4000) | VARCHAR(4000) | VARCHAR(4000) | VARCHAR2(4000) | TEXT | varchar(4000) | STRING(4000) | STRING | VARCHAR(4000) | STRING |
inet | character varying(4000) | VARCHAR(4000) | VARCHAR(4000) | VARCHAR2(4000) | TEXT | varchar(4000) | STRING(4000) | STRING | VARCHAR(4000) | STRING |
duration | character varying(4000) | VARCHAR(4000) | VARCHAR(4000) | VARCHAR2(4000) | TEXT | varchar(4000) | STRING(4000) | STRING | VARCHAR(4000) | STRING |
uuid / timeuuid | uuid | VARCHAR(64) | UNIQUEIDENTIFIER | VARCHAR2(128) | TEXT | varchar(64) | STRING | STRING | VARCHAR(64) | STRING |
Binary Type
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
blob | bytea | BLOB | VARBINARY(MAX) | BLOB | BYTEA | binary | BYTES | BYTES | BLOB | BINARY |
Collection & Complex Types
| Cassandra | PostgreSQL | MySQL | SQL Server | Oracle | Firebolt | Snowflake | Spanner | BigQuery | SingleStore | Databricks |
|---|
list / set / map / tuple / UDT | jsonb | JSON | VARCHAR(MAX) | CLOB | TEXT | varchar | JSON | JSON | JSON | STRING |
MariaDB maps identically to MySQL, and TigerData identically to PostgreSQL. MongoDB is a schemaless target — scalar types land as native BSON values and JSON columns become sub-documents, so it is not shown as a column above.
Synthesized Bounds
Cassandra has no column lengths, precisions, or scales — text is unbounded and decimal/varint are arbitrary-precision. Relational targets, however, want bounded, precise column types, and some bulk loaders reject unbounded ones outright. So the schema extractor synthesizes sensible bounds:
-
varint → NUMERIC(38,0) and decimal → NUMERIC(38,18). Without an explicit precision, Databricks defaults NUMERIC to DECIMAL(10,0) and overflows on a 19-digit varint. 38 is the maximum Databricks/Oracle decimal precision; scale 0 keeps varint an exact integer, and scale 18 leaves room for a fraction. Targets that prefer arbitrary precision (PostgreSQL, Firebolt) simply pad the scale.
-
text / varchar / ascii / inet / duration → length 4000. A length-less VARCHAR becomes VARCHAR(MAX) on SQL Server (whose BCP loader cannot bind it) and CLOB on Oracle (whose OCI loader cannot bulk-load it). 4000 is Oracle’s VARCHAR2 standard maximum and within SQL Server’s 8000 non-MAX cap.
The fixed 4000-character bound is lossy for genuinely large Cassandra text. It is ample for typical keys, labels, and short documents; tables that need larger text should raise this bound.
How Collections Are Encoded
list, set, map, tuple, and user-defined types are serialized to a single JSON column on the target. Wirekite does not explode collections into child tables, normalize maps, or model UDT fields — the relational skeleton (primary key + scalar columns) stays queryable, and the complex values ride along as JSON.
To keep the diff stable across runs, the JSON is canonicalized:
list and tuple → JSON array, order preserved.
set → JSON array, sorted (sets are unordered in Cassandra).
map → JSON object with keys sorted and coerced to strings.
- UDT → JSON object, fields in declared order.
Complex columns cannot be part of a primary key, and frozen<…> wrappers are transparent (the inner type is used).