[ Back to Sprockit home page ]
Sprockit ETL pipeline visualisation
Sprockit's process execution is controlled using resource dependency information stored in tables. This information can be used for other purposes, including the automation of ETL pipeline documentation.
sprockitviz
is a Windows command-line application that uses Sprockit dependency metadata and Graphviz1) to generate browseable, click-to-explore pipeline diagrams. sprockitviz
is free, open-source and can be downloaded from GitHub as C# source code or a compiled .exe
application.
Features
- Inter-diagram hyperlinks for easy pipeline exploration
- Process/resource-specific drawing styles for ease of recognition
- Process/resource tooltips that provide fuller information if required
The diagram on the right shows a screenshot of sprockitviz
output for the subgraph around Process I in the Sprockit tutorial. Process I is the centre of this subgraph so appears in gold – clicking on another node would take you to the subgraph centred on that one. Processes E and I have a heavier outline, indicating that they are are both executable processes. Hovering over process I has caused a tooltip to display, showing its fully-qualified process name and ID, its type and its average duration.
To see this example in action, have a look at the set of output files generated from the Sprockit tutorial's dependency metadata.
Output .svg
files must be viewed in a web browser to enable embedded hyperlink and tooltip functionality to be used.
Getting started
- Download and install Graphviz
Update
sprockitviz.exe.config
:Make sure that the value of
graphvizAppFolder
matches the location of your Graphviz executables (the folder is the Graphviz subdirectory where you find thedot.exe
application):<FireFive.PipelineVisualiser.SprockitViz graphvizAppFolder="C:\Program Files (x86)\Graphviz2.38\bin"
Add the details of your Sprockit instance in a new
<Instance>
node inside<SprockitInstances>
:<Instance alias="MySprockitInstance" connectionString="Server=MyDbServer;Database=SprockitDb;Trusted_Connection=yes;" outputFolder="C:\tmp\sprockitviz" maxWidth="4" maxHeight="6" subgraphRadius="1" > </Instance>
Other configuration options are available but this is enough to get you started.
outputFolder
is the name of the folder where you wantsprockitviz
to write its output, and it must exist.Open a command prompt and run
sprockitviz
, specifying the value ofalias
as a parameter (case-sensitive):sprockitviz.exe MySprockitInstance
Understanding the output
When sprockitviz
finishes executing, you will find a collection of .svg
files in the output folder you configured in the outputFolder
option. This includes:
Pipeline.svg
– a file containing your entire ETL pipeline graph. This is often of limited use because the graph is simply too big to make sense of (and may occasionally be too big even for Graphviz to handle).- one file for each process and resource object in your ETL pipeline.
Each object file shows the central object, filled in gold, surrounded by objects in its immediate neighbourhood. sprockitviz
attempts to determine a useful neighbourhood size by trying to make sure that the resulting graph is not too big, although you can choose to influence this behaviour yourself if you wish.
Nodes
Each node in the graph is displayed using the following conventions:
- Executable processes (stored procedures and SSIS packages) are displayed with a heavier outline. SSIS package names are given in red text.
- Stored queries (views and functions) are displayed with a dashed outline. Function names are suffixed with '()' (open/close parenthesis).
- Objects that do not exist (either by accident or design) are displayed with a dotted outline.
You can configure database-specific text colours for SQL objects if you wish.
Edges
- A pair of objects connected by a solid arrow indicates that the object at the end of the arrow is directly dependent on that at the arrow's start.
- An object at the end of a dashed arrow is indirectly dependent on that at the arrow's start; i.e. the dependency occurs via one or more intermediate objects not present in the subgraph.
Interaction
- Clicking on a node in the diagram opens the file containing the diagram with that node at its centre. Clicking on a series of nodes enables you quickly to pass through a series of diagrams, easily traversing ETL pipelines from end to end.
- Hovering over a node displays a tooltip indicating its full three-part name, its type and ID, and where applicable its average execution duration.
Neighbourhood
The notion of an object's neighbourhood is underpinned by the concept of a subgraph's radius:
- A subgraph of radius 1 around a process/resource contains the object itself and anything which is directly dependent on it, or on which it depends directly. The smallest possible neighbourhood for a process or resource has radius = 1.
sprockitviz
tests greater subgraph radii to find the largest neighbourhood for an object that can be contained withinmaxWidth
andmaxHeight
(as specified insprockitviz.exe.config
).- You can fix neighbourhood size by specifying a greater-than-zero value of
subgraphRadius
insprockitviz.exe.config
. This has the effect of acceleratingsprockitviz
execution, because other radii are no longer tested when building object neighbourhoods.
Note that the only complete set of dependency information present in a subgraph is that relating to the central (gold-filled) node. Other nodes in a subgraph may show multiple dependencies but these are not necessarily complete. To gain an accurate complete view of dependencies around an object you must consult the subgraph with that object at its centre.
Next steps
You're pretty much there, but you could:
- experiment with the values of
minWidth
,minHeight
andsubgraphRadius
in sprockit.exe.config to see what gives you the best set of outputs; - add a
<dbColors>
node to your Sprockit<Instance>
to show different databases' objects in different text colours.