Skip to main content

Overview

Wirekite supports SQL Server 2016 and above as a source database for:
  • Schema Extraction - Extract table definitions to Wirekite’s intermediate format
  • Data Extraction - Bulk extract table data for initial load
  • Change Extraction (CDC) - Capture ongoing inserts, updates, and deletes via Change Data Capture
SQL Server Change Extraction uses the built-in Change Data Capture (CDC) feature. CDC must be enabled at both the database and table level before Wirekite can capture changes.

Prerequisites

Before configuring SQL Server as a Wirekite source, ensure the following requirements are met:

Database Configuration

  1. Version: SQL Server 2016 or later
  2. SQL Server Agent (CDC only): Must be running for CDC capture jobs
  3. Change Data Capture (CDC only): Must be enabled at both database and table level (see Source Database Setup below)

File System Access

  • Ensure the user running Wirekite has write permissions to the output directory
  • Verify sufficient disk space for extracted data files

Replication Considerations

When performing initial data extraction, stop replication on the source database to ensure a consistent dump. It is highly recommended to run Wirekite against a replica rather than the production primary, as Wirekite can generate significant load on the database server.

Source Database Setup

Wirekite requires a dedicated login/user, a wirekite schema for internal tracking tables, and specific privileges on the application schema and CDC infrastructure. Follow these steps to configure your SQL Server source.

Step 1: Create the Wirekite Login

Connect as sa or a sysadmin and run:
-- Server-level login
CREATE LOGIN wirekite WITH PASSWORD = '<your_password>';

Step 2: Create a Database User

Connect to the application database and create a user mapped to the login:
USE <your_database>;
CREATE USER wirekite FOR LOGIN wirekite;

Step 3: Create the Wirekite Schema

Wirekite uses a dedicated wirekite schema for its internal tracking tables (wirekite_progress and wirekite_action). These tables track extraction progress and enable crash recovery.
CREATE SCHEMA wirekite AUTHORIZATION wirekite;
This gives the wirekite user ownership of the schema, allowing it to create, drop, and modify tables within it.

Step 4: Grant Read Access to Application Tables

For the application schema (e.g., dbo):
GRANT SELECT ON SCHEMA::dbo TO wirekite;
Or grant on specific tables:
GRANT SELECT ON dbo.table1 TO wirekite;
GRANT SELECT ON dbo.table2 TO wirekite;
-- repeat for each table to be extracted
Alternatively, add wirekite to the db_datareader role for read access to all tables:
EXEC sp_addrolemember 'db_datareader', 'wirekite';

Step 5: System Catalog Access

SQL Server grants SELECT on sys.* catalog views and INFORMATION_SCHEMA views to all database users by default. No additional grants are needed for:
  • sys.columns, sys.types, sys.tables, sys.schemas
  • sys.indexes, sys.index_columns, sys.key_constraints
  • sys.default_constraints
  • INFORMATION_SCHEMA.COLUMNS

Step 6 (CDC Only): Enable CDC on the Database

CDC must be enabled at the database level by a sysadmin or db_owner:
USE <your_database>;
EXEC sys.sp_cdc_enable_db;

Step 7 (CDC Only): Enable CDC on Each Table

For each table that needs change tracking:
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'table1',
    @role_name = NULL;

EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'table2',
    @role_name = NULL;

-- repeat for each table to be tracked
Setting @role_name = NULL means any user with SELECT access can read the CDC change tables. If you want to restrict access, specify a role name and add the wirekite user to that role.

Step 8 (CDC Only): Grant CDC-Specific Privileges

-- Read access to CDC change tables and metadata
GRANT SELECT ON SCHEMA::cdc TO wirekite;

-- CDC system functions
GRANT EXECUTE ON sys.fn_cdc_increment_lsn TO wirekite;
GRANT EXECUTE ON sys.fn_cdc_get_max_lsn TO wirekite;

-- Start/restart the CDC capture job if it stops
GRANT EXECUTE ON sys.sp_cdc_start_job TO wirekite;

-- Server-level: access to DMVs for monitoring CDC health
USE master;
GRANT VIEW SERVER STATE TO wirekite;

Step 9 (CDC Only): Grant Session Cleanup Privilege

Wirekite cleans up orphaned CDC extractor sessions on startup. This requires:
-- Server-level permission for KILL command
USE master;
GRANT ALTER ANY CONNECTION TO wirekite;
If you prefer not to grant this, Wirekite will still function but may log warnings about orphaned sessions that it cannot clean up.

Step 10 (HADR Handover Only): Additional Privileges

If you are extracting from an Always On Availability Group secondary replica and need handover operations:
-- For ALTER DATABASE SET HADR SUSPEND / RESUME
GRANT ALTER ON DATABASE::<your_database> TO wirekite;

-- For reading HADR replica state
GRANT VIEW SERVER STATE TO wirekite;  -- (already granted in Step 8)
The fn_dblog() fallback for LSN retrieval requires sysadmin or db_owner. This is only used when both CDC and HADR position methods fail.

Step 11: Verify SQL Server Agent is Running

CDC requires SQL Server Agent to be running:
SELECT dss.status, dss.servicename
FROM sys.dm_server_services dss
WHERE dss.servicename LIKE '%Agent%';
-- status should be 4 (Running)

Complete Setup Script

Data Extraction Only

-- Run as sysadmin (e.g., sa)

-- 1. Create login
CREATE LOGIN wirekite WITH PASSWORD = '<your_password>';

-- 2. Create database user
USE <your_database>;
CREATE USER wirekite FOR LOGIN wirekite;

-- 3. Create wirekite schema for internal tables
CREATE SCHEMA wirekite AUTHORIZATION wirekite;

-- 4. Grant read access to application schema
GRANT SELECT ON SCHEMA::dbo TO wirekite;

Data Extraction + CDC (Change Data Capture)

-- Run as sysadmin (e.g., sa)

-- 1. Create login
CREATE LOGIN wirekite WITH PASSWORD = '<your_password>';

-- 2. Create database user
USE <your_database>;
CREATE USER wirekite FOR LOGIN wirekite;

-- 3. Create wirekite schema for internal tables
CREATE SCHEMA wirekite AUTHORIZATION wirekite;

-- 4. Grant read access to application schema
GRANT SELECT ON SCHEMA::dbo TO wirekite;

-- 5. Enable CDC on the database
EXEC sys.sp_cdc_enable_db;

-- 6. Enable CDC on each table
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'table1',
    @role_name = NULL;

EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'table2',
    @role_name = NULL;
-- repeat for each table to be tracked

-- 7. Grant CDC access
GRANT SELECT ON SCHEMA::cdc TO wirekite;
GRANT EXECUTE ON sys.fn_cdc_increment_lsn TO wirekite;
GRANT EXECUTE ON sys.fn_cdc_get_max_lsn TO wirekite;
GRANT EXECUTE ON sys.sp_cdc_start_job TO wirekite;

-- 8. Server-level permissions
USE master;
GRANT VIEW SERVER STATE TO wirekite;
GRANT ALTER ANY CONNECTION TO wirekite;

-- 9. Verify SQL Server Agent is running
SELECT dss.status, dss.servicename
FROM sys.dm_server_services dss
WHERE dss.servicename LIKE '%Agent%';

Does Wirekite Need sysadmin?

No. Wirekite does not require sysadmin. However:
  • Enabling CDC (sp_cdc_enable_db, sp_cdc_enable_table) must be done by sysadmin or db_owner as a one-time setup step
  • The fn_dblog() fallback in handover requires sysadmin, but this is only used when both CDC and HADR position methods fail

Verify the Setup

Connect as the wirekite user and verify access:
USE <your_database>;

-- Should succeed
SELECT TOP 1 * FROM dbo.table1;

-- Should succeed
SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = 'dbo' AND table_name = 'table1';

-- Should succeed (CDC only)
SELECT * FROM cdc.lsn_time_mapping WHERE 1=0;

-- Should succeed (CDC only)
SELECT sys.fn_cdc_get_max_lsn();

What Wirekite Creates

ObjectTypeSchemaPurpose
wirekiteScheman/aHolds internal tracking tables
wirekite_progressTablewirekiteTracks extraction progress, crash recovery
wirekite_actionTablewirekiteRun/pause/stop control
wkp_finish_timeIndexwirekiteIndex on wirekite_progress
wkp_start_timeIndexwirekiteIndex on wirekite_progress
Wirekite does NOT create any triggers, stored procedures, or CDC infrastructure. CDC must be enabled by a DBA before Wirekite can use it. Wirekite only reads from the cdc.*_CT change tables that SQL Server creates automatically.

Privilege Summary

PrivilegeScopeRequired For
SELECT ON SCHEMA::dboDatabaseReading application table data
wirekite schema ownershipDatabaseInternal tracking tables (create/drop/modify)
SELECT ON SCHEMA::cdcDatabaseCDC: reading change tables
EXECUTE ON sys.fn_cdc_increment_lsnDatabaseCDC: LSN arithmetic
EXECUTE ON sys.fn_cdc_get_max_lsnDatabaseCDC: current LSN position
EXECUTE ON sys.sp_cdc_start_jobDatabaseCDC: restart capture job
VIEW SERVER STATEServerCDC: monitoring DMVs
ALTER ANY CONNECTIONServerCDC: cleaning up orphaned sessions
ALTER ON DATABASEDatabaseHADR handover only

Schema Extractor

The Schema Extractor reads table definitions from SQL Server and outputs them to Wirekite’s intermediate schema format (.skt file). This schema file is used by target loaders to create corresponding tables in the destination database.
The Schema Extractor can run standalone or as part of the Orchestrator workflow. The configuration parameters are the same in both cases.

Required Parameters

dsnFile
string
required
Path to a file containing the SQL Server connection string. The file should contain exactly one line with the DSN.
Connection string format:
sqlserver://username:password@host:port?database=databasename&encrypt=disable&connection+timeout=30
Example DSN file contents:
sqlserver://wirekite:secretpass@sqlserver-replica.example.com:1433?database=myapp&encrypt=disable&connection+timeout=30
tablesFile
string
required
Path to a file listing the tables to extract, one per line in schema.table format.
Example tablesFile contents:
dbo.users
dbo.orders
sales.customers
inventory.products
outputDirectory
string
required
Absolute path to the directory where Wirekite will write the schema file (wirekite_schema.skt). The directory must exist and be writable.
logFile
string
required
Absolute path to the log file where the Schema Extractor will write operational logs.

Optional Parameters

schemaRename
string
Rename the source schema in the output. Format: sourceSchema:targetSchema. Useful when the target database uses a different schema name.

Data Extractor

The Data Extractor performs bulk extraction of table data, writing records to Wirekite’s intermediate data format (.dkt files). Large tables are automatically split across multiple files based on the maxRowsPerDump setting.

Required Parameters

dsnFile
string
required
Path to a file containing the SQL Server connection string.
Connection string format:
sqlserver://username:password@host:port?database=databasename&encrypt=disable&connection+timeout=30
tablesFile
string
required
Path to a file listing the tables to extract, one per line in schema.table format.
outputDirectory
string
required
Absolute path to the directory where Wirekite will write data files. Files are named schema.table.N.dkt where N is a sequence number.
logFile
string
required
Absolute path to the log file for Data Extractor operations.

Optional Parameters

maxThreads
integer
default:"5"
Number of parallel extraction threads. Each thread consumes one database connection and approximately one CPU core. Set this based on available CPU cores and acceptable database load.
maxRowsPerDump
integer
default:"200000"
Maximum number of rows written to each output file. Large tables are split into multiple files. Adjust based on available memory and disk space.
hexEncoding
boolean
default:"false"
When true, binary and string data is encoded as hexadecimal instead of base64. Hex encoding produces larger files but may be required for certain target databases (e.g., Snowflake).
sortFiles
boolean
default:"true"
When true, sorts output files by table name for easier management and verification.
useExternalDumper
boolean
default:"true"
When true, uses a native C extractor instead of the default SQL Server extractor. This can provide better performance for large extractions.
schemaRename
string
Rename the source schema in the output. Format: sourceSchema:targetSchema.
For best performance on dedicated extraction hosts, set maxThreads equal to the number of CPU cores available.

Change Extractor

The Change Extractor captures ongoing data changes (INSERT, UPDATE, DELETE) by reading SQL Server CDC change tables. It outputs change events to Wirekite’s change format (.ckt files) for loading into target databases.
The Change Extractor requires Change Data Capture (CDC) to be enabled at both the database and table level. It reads from the CDC change tables (cdc.*_CT) to capture modifications.

Required Parameters

dsnFile
string
required
Path to a file containing the SQL Server connection string.
Connection string format:
sqlserver://username:password@host:port?database=databasename&encrypt=disable&connection+timeout=30
tablesFile
string
required
Path to a file listing the tables to track for changes, one per line in schema.table format.
outputDirectory
string
required
Absolute path to the directory where Wirekite will write change files. Files are named with sequential numbers (e.g., 0.ckt, 1.ckt, etc.).
logFile
string
required
Absolute path to the log file for Change Extractor operations.

Position Parameters

startLsn
string
The Log Sequence Number (LSN) from which to start extracting changes. This is typically the LSN captured when the data extraction was taken. When running via the Orchestrator in data+change mode, this is automatically captured.
In change-only mode (without data extraction), you must specify the correct startLsn to define where change capture should begin. Using incorrect values may result in missing changes or duplicate data.

Optional Parameters

verbose
boolean
default:"false"
When true, enables detailed logging including metadata files that can help troubleshoot issues.
exitWhenIdle
boolean
default:"false"
When true, the extractor exits after processing all available changes. When false (default), it runs continuously, waiting for new changes indefinitely.
lsnBatchSize
integer
default:"1000000"
Number of LSNs to process in each batch. Databases with autocommit or many small transactions may generate many LSNs, in which case this value can be increased for better performance.
schemaRename
string
Rename the source schema in the output. Format: sourceSchema:targetSchema.

Orchestrator Configuration

When using the Wirekite Orchestrator, prefix source parameters with source.schema., source.data., or source.change. depending on the operation mode. Example orchestrator configuration for SQL Server source:
# Main configuration
source=sqlserver
target=snowflake

# Schema extraction
source.schema.dsnFile=/opt/wirekite/config/sqlserver.dsn
source.schema.tablesFile=/opt/wirekite/config/tables.txt
source.schema.outputDirectory=/opt/wirekite/output/schema
source.schema.logFile=/var/log/wirekite/schema-extractor.log

# Data extraction
source.data.dsnFile=/opt/wirekite/config/sqlserver.dsn
source.data.tablesFile=/opt/wirekite/config/tables.txt
source.data.outputDirectory=/opt/wirekite/output/data
source.data.logFile=/var/log/wirekite/data-extractor.log
source.data.maxThreads=8
source.data.maxRowsPerDump=100000
source.data.hexEncoding=false

# Change extraction (CDC)
source.change.dsnFile=/opt/wirekite/config/sqlserver.dsn
source.change.tablesFile=/opt/wirekite/config/tables.txt
source.change.outputDirectory=/opt/wirekite/output/changes
source.change.logFile=/var/log/wirekite/change-extractor.log
source.change.lsnBatchSize=1000000
source.change.exitWhenIdle=false
When running in data+change mode, the Orchestrator automatically captures the LSN after data extraction and passes it to the Change Extractor. You do not need to manually specify startLsn in this mode.
For complete Orchestrator documentation, see the Execution Guide.