Metadata-driven SQL code generation
October 2022's T-SQL Tuesday is hosted by Steve Jones (b|t) – thanks Steve 😊. Steve's invitation this month is to write about using dynamic SQL – that is, T-SQL code that is built up from text strings and then executed as code. SELECT * FROM MyTable
is a valid, non-dynamic SQL query, while EXEC('SELECT * FROM MyTable')
is dynamic – the text string “SELECT * FROM MyTable” is passed to EXECUTE
for execution as a SQL query.
The text string isn't validated in any way before it's executed, which – as Steve says – makes many experts decry dynamic SQL as a poor way of solving problems. Sometimes it's slow, sometimes it's ugly, and who can forget little Bobby Tables?
You can see why people get nervous about dynamic SQL in OLTP systems. Happily though, I'm a data engineer – for me, dynamic T-SQL is a key tool for building metadata-driven data platforms in SQL environments.
Ingesting one table
Let's imagine you're building database objects to manage ingestion of data from external sources into a database. This could be the basis for some fact or dimension tables later on, but to begin you just need to get the data. You might start with a loosely-typed table into which a tool like Azure Data Factory 😉 or SQL Server Integration Services can easily import data.
CREATE TABLE raw.Customer ( CustomerId NVARCHAR(4000) , FamilyName NVARCHAR(4000) , Forename NVARCHAR(4000) , Address NVARCHAR(4000) , DateOfBirth NVARCHAR(4000) , UpdatedDateTime NVARCHAR(4000) )
The permissive NVARCHAR(4000)
column types mean that the table can accept most data, even if formatted badly – this can be useful because, even if you can't take the data any further, you can at least inspect it in a database environment.
To take it further you might want to apply some stronger data types, perhaps adding some change tracking features:
CREATE OR ALTER VIEW stg.Customer AS SELECT CAST(CustomerId AS UNIQUEIDENTIFIER) AS CustomerId , CAST(FamilyName AS VARCHAR(100)) AS FamilyName , CAST(Forename AS VARCHAR(100)) AS Forename , CAST(Address AS VARCHAR(255)) AS Address , CAST(DateOfBirth AS DATE) AS DateOfBirth , CAST(UpdatedDateTime AS DATETIME) AS _ValidFromDateTime , CAST(HASHBYTES('md5', CAST(COALESCE(FamilyName, N'') AS NVARCHAR(MAX)) + '||' + CAST(COALESCE(Forename, N'') AS NVARCHAR(MAX)) + '||' + CAST(COALESCE(Address, N'') AS NVARCHAR(MAX)) + '||' + CAST(COALESCE(DateOfBirth, N'') AS NVARCHAR(MAX)) ) AS BINARY(16)) AS _AttributeDigest FROM raw.Customer
This view could then be the source for an upsert into a history tracking table:
CREATE TABLE stg.CustomerHistory ( CustomerHistoryId INT PRIMARY KEY IDENTITY , CustomerId UNIQUEIDENTIFIER NOT NULL , VersionNumber INT NOT NULL DEFAULT 1 , FamilyName NVARCHAR(100) NOT NULL , Forename NVARCHAR(100) NOT NULL , Address NVARCHAR(255) NOT NULL , DateOfBirth DATE NOT NULL , _AttributeDigest BINARY(16) NOT NULL , _ValidFromDateTime DATETIME NOT NULL , _ValidToDateTime DATETIME NOT NULL DEFAULT '9000-12-31' )
Maybe using a stored procedure like this:
CREATE OR ALTER PROCEDURE stg.Update_CustomerHistory AS WITH latest AS ( SELECT TOP 1 WITH TIES * FROM stg.CustomerHistory ORDER BY ROW_NUMBER() OVER ( PARTITION BY CustomerId ORDER BY _ValidFromDateTime DESC ) ) SELECT latest.CustomerHistoryId , COALESCE(latest.VersionNumber, 0) + 1 AS VersionNumber , new.* INTO #changes FROM stg.Customer new LEFT JOIN latest ON latest.CustomerId = new.CustomerId WHERE latest.CustomerId IS NULL OR ( latest._AttributeDigest <> new._AttributeDigest AND latest._ValidFromDateTime < new._ValidFromDateTime ); UPDATE hx SET _ValidToDateTime = DATEADD(MILLISECOND, -3, chx._ValidFromDateTime) FROM stg.CustomerHistory hx INNER JOIN #changes chx ON chx.CustomerHistoryId = hx.CustomerHistoryId; INSERT INTO stg.CustomerHistory ( CustomerId , VersionNumber , FamilyName , Forename , Address , DateOfBirth , _AttributeDigest , _ValidFromDateTime ) SELECT CustomerId , VersionNumber , FamilyName , Forename , Address , DateOfBirth , _AttributeDigest , _ValidFromDateTime FROM #changes;
Ingesting another table
There's a lot going on here!
- A permissive staging table
- A view that enforces stronger types and calculates an attribute digest. The digest is a convenient way of combining into a single value all the attributes we're interested in tracking for change.
- A history table that includes multiple, numbered versions of a customer's record, with validity start and end datetimes.
- A stored procedure that compares the view – containing new, incoming records – against the latest history record, end-dating old customer records and inserting new ones.
Now you come to ingest your second table – and you have to do this all again!
It's a lot of code, and worse, it's all basically the same – only the names and types are any different. You can easily make a mistake, and the boredom of repetition only makes that more likely. Later on, when you decide you want to add something to the process – a logging statement at the end of the stored proc, for example – you'll have to add it separately to every stored procedure you've created.
The solution? Generate your code automatically, from metadata, using dynamic SQL! 😀
Dynamic SQL with metadata
For this simple example we need one table of metadata, containing information about columns in our staging table:
TableName | ColumnName | ColumnOrder | DataType | IsPrimaryKey | IsAuditTimestamp |
---|---|---|---|---|---|
Customer | CustomerId | 1 | UNIQUEIDENTIFIER | 1 | 0 |
Customer | FamilyName | 2 | NVARCHAR(100) | 0 | 0 |
Customer | Forename | 3 | NVARCHAR(100) | 0 | 0 |
Customer | Address | 4 | NVARCHAR(255) | 0 | 0 |
Customer | DateOfBirth | 5 | DATE | 0 | 0 |
Customer | UpdatedDateTime | 6 | DATETIME | 0 | 1 |
The metadata table lists the sorce table columne we're interested in, along with their data types and other information. The table is stored in the database, like any other – let's call it metadata.TableColumn
. We can select from it and use the query results to build more SQL code: writing SQL code that writes SQL code, using dynamic T-SQL.
Build the staging table
This piece of SQL builds another piece of SQL, which we will then be able to run to create the staging table:
DECLARE @tableName NVARCHAR(128) = 'Customer'; DECLARE @columns NVARCHAR(MAX) = ''; SELECT @columns += ' , ' + QUOTENAME(ColumnName) + ' NVARCHAR(4000) NULL' FROM metadata.TableColumn WHERE TableName = @tableName ORDER BY ColumnOrder DECLARE @sql NVARCHAR(MAX) = 'CREATE TABLE [raw].' + QUOTENAME(@tableName) + ' ( ' + SUBSTRING(@columns, 5, LEN(@columns)) + ' );' PRINT @sql; --EXEC(@sql);
Here's the output:
CREATE TABLE [raw].[Customer] ( [CustomerId] NVARCHAR(4000) NULL , [FamilyName] NVARCHAR(4000) NULL , [Forename] NVARCHAR(4000) NULL , [Address] NVARCHAR(4000) NULL , [DateOfBirth] NVARCHAR(4000) NULL , [UpdatedDateTime] NVARCHAR(4000) NULL );
To create the staging table, uncomment EXEC(@sql)
at the end of the previous snippet.
Build the history table
This is a bit trickier, because we need exclude some source columns and add some extra ones:
DECLARE @tableName NVARCHAR(128) = 'Customer'; DECLARE @columns NVARCHAR(MAX) = ''; SELECT @columns += ' , ' + QUOTENAME(ColumnName) + ' ' + DataType + ' NOT NULL' FROM metadata.TableColumn WHERE TableName = @tableName AND IsAuditTimestamp = 0 ORDER BY ColumnOrder DECLARE @sql NVARCHAR(MAX) = 'CREATE TABLE [stg].' + QUOTENAME(@tableName + 'History') + ' ( ' + QUOTENAME(@tableName + 'HistoryId') + ' INT PRIMARY KEY IDENTITY , [VersionNumber] INT NOT NULL DEFAULT 1 ' + @columns + ' , [_AttributeDigest] BINARY(16) NOT NULL , [_ValidFromDateTime] DATETIME NOT NULL , [_ValidToDateTime] DATETIME NOT NULL DEFAULT ''9000-12-31'' );' PRINT @sql; --EXEC(@sql);
And here's the output again:
CREATE TABLE [stg].[CustomerHistory] ( [CustomerHistoryId] INT PRIMARY KEY IDENTITY , [VersionNumber] INT NOT NULL DEFAULT 1 , [CustomerId] UNIQUEIDENTIFIER NOT NULL , [FamilyName] NVARCHAR(100) NOT NULL , [Forename] NVARCHAR(100) NOT NULL , [Address] NVARCHAR(255) NOT NULL , [DateOfBirth] DATE NOT NULL , [_AttributeDigest] BINARY(16) NOT NULL , [_ValidFromDateTime] DATETIME NOT NULL , [_ValidToDateTime] DATETIME NOT NULL DEFAULT '9000-12-31' );
Build the staging view
This adds a bit more complication – we need to cast the source columns appropriately and build the attribute digest:
DECLARE @tableName NVARCHAR(128) = 'Customer'; DECLARE @viewColumns NVARCHAR(MAX) = ''; DECLARE @digestColumns NVARCHAR(MAX) = ''; SELECT @viewColumns += ' , CAST(' + QUOTENAME(ColumnName) + ' AS ' + DataType + ') AS ' + QUOTENAME(ColumnName) , @digestColumns += IIF(IsPrimaryKey = 1 OR IsAuditTimestamp = 1, '', ' + ''||'' + CAST(COALESCE(' + QUOTENAME(ColumnName) + ', N'''') AS NVARCHAR(MAX))') FROM metadata.TableColumn WHERE TableName = @tableName ORDER BY ColumnOrder; DECLARE @sql NVARCHAR(MAX) = 'CREATE OR ALTER VIEW [stg].' + QUOTENAME(@tableName) + ' AS SELECT ' + SUBSTRING(@viewColumns, 5, LEN(@viewColumns)) + ' , CAST(HASHBYTES(''md5'', ' + SUBSTRING(@digestColumns, 18, LEN(@digestColumns)) + ' ) AS BINARY(16)) AS _AttributeDigest FROM [raw].' + QUOTENAME(@tableName); PRINT @sql; --EXEC(@sql);
And again, the output:
CREATE OR ALTER VIEW [stg].[Customer] AS SELECT CAST([CustomerId] AS UNIQUEIDENTIFIER) AS [CustomerId] , CAST([FamilyName] AS NVARCHAR(100)) AS [FamilyName] , CAST([Forename] AS NVARCHAR(100)) AS [Forename] , CAST([Address] AS NVARCHAR(255)) AS [Address] , CAST([DateOfBirth] AS DATE) AS [DateOfBirth] , CAST([UpdatedDateTime] AS DATETIME) AS [UpdatedDateTime] , CAST(HASHBYTES('md5', CAST(COALESCE([FamilyName], N'') AS NVARCHAR(MAX)) + '||' + CAST(COALESCE([Forename], N'') AS NVARCHAR(MAX)) + '||' + CAST(COALESCE([Address], N'') AS NVARCHAR(MAX)) + '||' + CAST(COALESCE([DateOfBirth], N'') AS NVARCHAR(MAX)) ) AS BINARY(16)) AS _AttributeDigest FROM [raw].[Customer]
Build the stored procedure
Why don't you have a go at this one? 😀
Summary
Dynamic SQL can be a scary thing in OLTP environments, but for large scale generation of boilerplate T-SQL code it can be very powerful. You can generate – and re-generate – literally hundreds of database objects in moments, so for large implementations of repetetive code you can save a lot of time and effort.
The examples I've given above have some problems, but these can all be resolved by improving the metadata model, e.g.:
- It assumes that source primary keys are always single-attribute – you could replace
IsPrimaryKey
withPrimaryKeyColumnNumber
. - It assumes every history table attribute is non-nullable – you could add an
IsNullable
attribute to the metadata table. - It adds the
VersionNumber
column beforeCustomerId
, although it appeared afterwards in the hand-coded version – you could collect primary key columns beforeVersionNumber
and add other columns afterwards.
Share! If you found this post useful, please share it
– and if you're stuck on building the stored procedure, tweet me and I'll be happy to help 😊. Thanks for reading, and thanks for hosting Steve!