[ Back to Sprockit home page ]

ETL process lineage tracking with Sprockit

This feature requires Sprockit v1.4 or higher.

Sprockit supports ETL process lineage tracking by allowing a process execution ID value to be passed to a process at runtime.1) The execution ID is then available for you to use within your stored procedure or SSIS package as you wish – e.g. writing it into data tables to indicate which process execution created or last updated each record.

To enable Sprockit to pass its execution ID to your process, you need to provide a SprockitExecutionId parameter:

  • In a stored procedure, define parameter @sprockitExecutionId INT;
  • in an SSIS package, define a package parameter SprockitExecutionId of type Int32.

Sprockit automatically provides a value for this parameter (if defined) at runtime, populated with the process's [ExecutionId] from sprockit.Execution – you can then record this in data tables as required.

If you choose to persist process lineage information using this feature, you should also disable automatic trimming of Sprockit's log tables by setting the value of property LogRetentionPeriod to -1.

In this very simple example, we have a stored procedure being used to append some records to a table:

CREATE PROCEDURE dbo.usp_AppendRecords
AS
BEGIN
 
  INSERT INTO dbo.AllRecordsEver (
    Id
  )
  SELECT 
    Id
  FROM dbo.NewRecords
 
END

Sprockit has been configured to run this stored procedure, something like this:

EXEC sprockit.usp__ConfigureProcess '
<Process name="MyDb.dbo.usp_AppendRecords">
  <Inputs>
    <Resource name="MyDb.dbo.NewRecords" />
  </Inputs>
  <Outputs>
    <Resource name="MyDb.dbo.AllRecordsEver" />
  </Outputs>
</Process>'

The stored procedure is pretty straightforward, but looking at dbo.AllRecordsEver we can't say anything about the origin of its records. This improved version adds a bit more information:

ALTER PROCEDURE dbo.usp_AppendRecords
AS
BEGIN
 
  INSERT INTO dbo.AllRecordsEver (
    Id
  , DateAdded
  )
  SELECT 
    Id
  , GETDATE()
  FROM dbo.NewRecords
 
END

This is better because we can at least see which records arrived when, but it's still hard to tie this back to actual ETL activity.

Sprockit can help us here – if we define a @sprockitExecutionId parameter for the SP, Sprockit will automatically populate it with the process's runtime execution ID (identifying a record in sprockit.Execution). This allows us to include that value in new records in the table:

ALTER PROCEDURE dbo.usp_AppendRecords (
  @sprockitExecutionId INT
) AS
BEGIN
 
  INSERT INTO dbo.AllRecordsEver (
    Id
  , ExecutionId
  )
  SELECT 
    Id
  , @sprockitExecutionId
  FROM dbo.NewRecords
 
END

If we imagine that records added originally to dbo.NewRecords are labelled the same way, we can strengthen this further:

ALTER PROCEDURE dbo.usp_AppendRecords (
  @sprockitExecutionId INT
) AS
BEGIN
 
  INSERT INTO dbo.AllRecordsEver (
    Id
  , ExecutionId
  , SourceExecutionId
  )
  SELECT 
    Id
  , @sprockitExecutionId
  , ExecutionId
  FROM dbo.NewRecords
 
END

1)
This can also be used for custom logging.