[ Back to Sprockit home page ]
Sprockit tutorial
In this tutorial, you'll learn:
- how processes are configured in Sprockit;
- what normal (error-free) execution looks like;
- basic process configuration;
- what execution errors look like, and how to handle them.
If you haven't done so already, you should make sure you understand the basic concepts by reading the Introduction to Sprockit.
Setup
You will need:
- an empty Sprockit installation;
the tutorial demo setup script, available here:
Download it, open the script in SSMS, connect to your Sprockit database, then run it.
The demo configuration contains ten simple stored procedures, named demo.usp_ProcessA
through demo.usp_ProcessJ
. The demo setup script creates these SPs, registers them in sprockit.Process
, and records their inputs and outputs in sprockit.Resource
.
The processes need to be run in an order that respects these dependencies:
Lesson 1: Explore
In this lesson we'll look at some of the objects in the Sprockit framework.
Have a look at the tables and stored procedures in the database. The SPs and tables in the
demo
schema are just for this tutorial, everything in thesprockit
schema is part of Sprockit.Notice particularly the tables:
- sprockit.Process – this contains processes for execution by Sprockit
sprockit.Resource – this contains input and output resources for those processes, used to infer process dependencies.
You'll notice that process
demo.usp_ProcessF
hasn't been configured – that's OK, we'll do it later.Have a look at what's in sprockit.Process. In here you'll see an entry for each stored procedure that you found in the
demo
schema.(From Sprockit v1.5 you'll also see a process called 'SprockitIndexMaintenance' – this is a process used by Sprockit to maintain its own table indexes).
Look in sprockit.Resource and you'll see the input and output resources used by each process.
Finally, have a look at the list of SQL Server Agent jobs. You'll find a new job called RunSprockit (SprockitDb) – this runs Sprockit's process manager.
(Instead of SprockitDb, you'll see the name of the database where you installed Sprockit).
Lesson 2: Run
In this lesson we'll run an ETL batch, watch it in execution, then look at the results.
Start the 'Run Sprockit' SQL Agent job at step 1.
In your query window, run this monitoring query:
Double-click anywhere in the code to select the whole block.
SELECT * FROM sprockit.Process ORDER BY ProcessId
Keep re-running it every few seconds and you'll notice the status of processes changing as they become Ready, start Running, and finally are Done.
You will use the query again – keep it open in SSMS.
When everything's Done, take a look at what happened:
In table sprockit.Execution you will find a record for each process executed in the batch. Each one has a unique [ExecutionId], a [ProcessId] (from sprockit.Process) indicating which process was executed, and a [HandlerId] indicating which handler ran the process – this refers to a record in sprockit.Handler.
The table sprockit.Handler contains one record for each process handler that was used in the batch. You'll find a few records in here, each with a unique [HandlerId] and with a [BatchId] referring to the entry in sprockit.Batch.
sprockit.Batch records a unique identifier for each batch, along with its start time and the values of various system properties when the batch started.
Start the 'Run Sprockit' job again, then rerun the monitoring query – now you'll notice that many of values in fields [AvgDuration] and [BranchWeight] are non-zero. These values are updated at batch start using data from sprockit.Execution, and are used to make process selection decisions.
(If you watch carefully, you'll notice that when multiple processes are Ready at the same time, the process that runs next is the one with with the highest [BranchWeight] value).
In sprockit.Handler, notice the field [SqlAgentJobName]. This stores the name of the SQL agent job that ran each handler, but if you refresh the list of agent jobs after all processes are Done, you won't find any handler jobs – each one is deleted automatically when the handler exits.
Start the 'Run Sprockit' job again, then refresh the list of agent jobs – you'll notice immediately that a job called something like SprockitDb_SprockitHandler_20180306211345 has been created. Keep refreshing the list of jobs while the process manager runs and you will see more, similar jobs created as the manager spawns new handlers. Each of these gradually disappears as the handlers finish and the jobs are deleted.
- The Run Sprockit job is Sprockit's process manager. Whenever it finds processes ready, it starts a new handler. Add a schedule to this job to run your ETL processing automatically at your preferred time.
- Handlers run in separate SQL Agent jobs with names like SprockitDb_SprockitHandler_20180306211345. They are deleted automatically when they finish.
Lesson 3: Configure
In this lesson we'll add a new process – demo.usp_ProcessF
– to the ETL routine.
To run
demo.usp_ProcessF
, we need to add it to the table sprockit.Process. Run this script to do that:INSERT INTO sprockit.Process ( DbName , SchemaName , ProcessName ) VALUES ( DB_NAME() , 'demo' , 'usp_ProcessF' )
To run the new process in the right order, we need to add its resources to sprockit.Resource.
Look at the definition of
demo.usp_ProcessF
and you will see that the procedure selects data from tablesdemo.Table_03
anddemo.Table_04
, and uses it to populatedemo.Table_08
. Sprockit refers toTable_03
andTable_04
as inputs forusp_ProcessF
, and toTable_08
as its output.Populate the process's inputs and outputs by running this script:
INSERT INTO [sprockit].[Resource] ( [ProcessId] , [IsInput] , [DbName] , [SchemaName] , [ResourceName] ) SELECT p.ProcessId , t.IsInput , DB_NAME() , 'demo' , t.TableName FROM sprockit.Process p CROSS JOIN (VALUES ('Table_03', 1) , ('Table_04', 1) , ('Table_08', 0) ) t (TableName, IsInput) WHERE p.ProcessName = 'usp_ProcessF'
We're configuring
usp_ProcessF
in this way so that you can see exactly what's going on, but other options are available!Start the 'Run Sprockit' job again, then rerun the monitoring query. Keep refreshing the results to watch processes being run – you will notice that
usp_ProcessF
runs afterusp_ProcessB
and beforeusp_ProcessI
(as indicated by the process dependency graph at the top of the page).Notice also that when you configured
usp_ProcessF
, you didn't specify any information directly aboutusp_ProcessB
orusp_ProcessI
– you just provided the list of resources used and produced byusp_ProcessF
.
- For a process to run, it must appear in sprockit.Process. For processes to run in the right order, their resource dependencies must be recorded in sprockit.Resource.
- Configuring resource dependencies means you don't need to know anything about other processes that are already set up.
Lesson 4: Resume
In this lesson we'll encounter an ETL error, fix it, then resume processing where we left off.
Run this SQL script to alter the definition of
demo.usp_ProcessE
:ALTER PROCEDURE [demo].[usp_ProcessE] AS DECLARE @i INT = CAST('abc' AS INT) WAITFOR DELAY '00:00:05'
This is going to cause an error when it attempts to cast the string 'abc' to an INT.
Start the 'Run Sprockit' job, then use the monitoring query from lesson 2 to monitor progress. After a while you'll notice a few things:
demo.usp_ProcessE
gets the status Errored;demo.usp_ProcessI
gets the status Blocked because it depends ondemo.usp_ProcessE
(as shown in the dependency graph at the top of this page);other processes not dependent on
demo.usp_ProcessE
continue to run normally, until everything reachable (not Errored or Blocked) is Done.
When the process manager finishes, you will see in the agent job history that the 'Run Sprockit' job is reported to have failed – this indicates that not all processes were Done when it finished.
To see what exactly went wrong, take a look in the table sprockit.Error:
SELECT * FROM [sprockit].Error
In here you will find a record indicating that
demo.usp_ProcessE
failed, reporting the error “Conversion failed when converting the varchar value 'abc' to data type int” on line 4.Fix the error by altering the procedure, commenting out the offending statement:
ALTER PROCEDURE [demo].[usp_ProcessE] AS --DECLARE @i INT = CAST('abc' AS INT) WAITFOR DELAY '00:00:05'
We're ready to carry on processing now, but we don't want to have to start again from the beginning – we just want to pick up from where we left off.
To do this, start the 'Run Sprockit' SQL Agent job at step 2, then run the progress monitoring query. You'll notice that everything that was already Done is still Done – but that
demo.usp_ProcessE
is now Ready (or already Running again) anddemo.usp_ProcessI
is Not ready. Whendemo.usp_ProcessE
now finishes successfully,demo.usp_ProcessI
will become Ready, start Running, and finally be Done in the usual way.Finally, if you take another look in the sprockit.Execution table you'll find two entries for
demo.usp_ProcessE
in the latest batch – one which failed with error number 245 and one which succeeded.SELECT e.* FROM sprockit.Process p INNER JOIN sprockit.Execution e ON e.ProcessId = p.ProcessId INNER JOIN sprockit.Handler h ON h.HandlerId = e.HandlerId WHERE p.ProcessName = 'usp_ProcessE' AND h.BatchId = ( SELECT MAX(BatchId) FROM sprockit.Batch )
- The Run Sprockit job is Sprockit's process manager. Starting it at step 2 causes processing to pick up where it left off (usually after an error).
- This is the only time you would start it at step 2 – normally it runs on a schedule and starts at step 1.
Next steps
Now you're ready to start using Sprockit!
- Create a new Sprockit install (don't use the demo database you've been using – make a fresh one).
- Schedule the Run Sprockit job to run when you want your ETL processing to start.
To clean up after the tutorial, you can just drop the tutorial database and the 'Run Sprockit' agent job.
Ready to level up? Time to consider:
- getting up to speed with Sprockit's utility stored procedures;
- making Sprockit perform index maintenance for you;
- tracking ETL process lineage to provide assurance and aid pipeline diagnostics;
- using custom logging to record your own process features.
Any problems? Have a look at some frequently asked questions.