[ Back to Sprockit home page ]
ETL process lineage tracking with Sprockit
This feature requires Sprockit v1.4 or higher.
Sprockit supports ETL process lineage tracking by allowing a process execution ID value to be passed to a process at runtime.1) The execution ID is then available for you to use within your stored procedure or SSIS package as you wish – e.g. writing it into data tables to indicate which process execution created or last updated each record.
To enable Sprockit to pass its execution ID to your process, you need to provide a SprockitExecutionId parameter:
- In a stored procedure, define parameter
@sprockitExecutionId INT
; - in an SSIS package, define a package parameter
SprockitExecutionId
of typeInt32
.
Sprockit automatically provides a value for this parameter (if defined) at runtime, populated with the process's [ExecutionId] from sprockit.Execution – you can then record this in data tables as required.
If you choose to persist process lineage information using this feature, you should also disable automatic trimming of Sprockit's log tables by setting the value of property LogRetentionPeriod
to -1
.
Example
Setup
In this very simple example, we have a stored procedure being used to append some records to a table:
CREATE PROCEDURE dbo.usp_AppendRecords AS BEGIN INSERT INTO dbo.AllRecordsEver ( Id ) SELECT Id FROM dbo.NewRecords END
Sprockit has been configured to run this stored procedure, something like this:
EXEC sprockit.usp__ConfigureProcess ' <Process name="MyDb.dbo.usp_AppendRecords"> <Inputs> <Resource name="MyDb.dbo.NewRecords" /> </Inputs> <Outputs> <Resource name="MyDb.dbo.AllRecordsEver" /> </Outputs> </Process>'
Recording lineage information
The stored procedure is pretty straightforward, but looking at dbo.AllRecordsEver
we can't say anything about the origin of its records. This improved version adds a bit more information:
ALTER PROCEDURE dbo.usp_AppendRecords AS BEGIN INSERT INTO dbo.AllRecordsEver ( Id , DateAdded ) SELECT Id , GETDATE() FROM dbo.NewRecords END
This is better because we can at least see which records arrived when, but it's still hard to tie this back to actual ETL activity.
Sprockit can help us here – if we define a @sprockitExecutionId
parameter for the SP, Sprockit will automatically populate it with the process's runtime execution ID (identifying a record in sprockit.Execution). This allows us to include that value in new records in the table:
ALTER PROCEDURE dbo.usp_AppendRecords ( @sprockitExecutionId INT ) AS BEGIN INSERT INTO dbo.AllRecordsEver ( Id , ExecutionId ) SELECT Id , @sprockitExecutionId FROM dbo.NewRecords END
If we imagine that records added originally to dbo.NewRecords
are labelled the same way, we can strengthen this further:
ALTER PROCEDURE dbo.usp_AppendRecords ( @sprockitExecutionId INT ) AS BEGIN INSERT INTO dbo.AllRecordsEver ( Id , ExecutionId , SourceExecutionId ) SELECT Id , @sprockitExecutionId , ExecutionId FROM dbo.NewRecords END