[ Back to Sprockit home page ]

Sprockit ETL pipeline visualisation

Sprockit's process execution is controlled using resource dependency information stored in tables. This information can be used for other purposes, including the automation of ETL pipeline documentation.

sprockitviz is a Windows command-line application that uses Sprockit dependency metadata and Graphviz1) to generate browseable, click-to-explore pipeline diagrams. sprockitviz is free, open-source and can be downloaded from GitHub as C# source code or a compiled .exe application.

  • Automatic decomposition of large ETL pipeline graphs into sets of process/resource-centric subgraphs;
  • Inter-diagram hyperlinks for easy pipeline exploration
  • Process/resource-specific drawing styles for ease of recognition
  • Process/resource tooltips that provide fuller information if required

The diagram on the right shows a screenshot of sprockitviz output for the subgraph around Process I in the Sprockit tutorial. Process I is the centre of this subgraph so appears in gold – clicking on another node would take you to the subgraph centred on that one. Processes E and I have a heavier outline, indicating that they are are both executable processes. Hovering over process I has caused a tooltip to display, showing its fully-qualified process name and ID, its type and its average duration.

To see this example in action, have a look at the set of output files generated from the Sprockit tutorial's dependency metadata.

Output .svg files must be viewed in a web browser to enable embedded hyperlink and tooltip functionality to be used.

  1. Download and install Graphviz
  2. Download sprockitviz.exe and sprockitviz.exe.config from GitHub
  3. Update sprockitviz.exe.config:

    1. Make sure that the value of graphvizAppFolder matches the location of your Graphviz executables (the folder is the Graphviz subdirectory where you find the dot.exe application):

        graphvizAppFolder="C:\Program Files (x86)\Graphviz2.38\bin"
    2. Add the details of your Sprockit instance in a new <Instance> node inside <SprockitInstances>:

        subgraphRadius="1" >

    Other configuration options are available but this is enough to get you started. outputFolder is the name of the folder where you want sprockitviz to write its output, and it must exist.

  4. Open a command prompt and run sprockitviz, specifying the value of alias as a parameter (case-sensitive):

    sprockitviz.exe MySprockitInstance

When sprockitviz finishes executing, you will find a collection of .svg files in the output folder you configured in the outputFolder option. This includes:

  • Pipeline.svg – a file containing your entire ETL pipeline graph. This is often of limited use because the graph is simply too big to make sense of (and may occasionally be too big even for Graphviz to handle).
  • one file for each process and resource object in your ETL pipeline.

Each object file shows the central object, filled in gold, surrounded by objects in its immediate neighbourhood. sprockitviz attempts to determine a useful neighbourhood size by trying to make sure that the resulting graph is not too big, although you can choose to influence this behaviour yourself if you wish.

Each node in the graph is displayed using the following conventions:

  • Executable processes (stored procedures and SSIS packages) are displayed with a heavier outline. SSIS package names are given in red text.
  • Stored queries (views and functions) are displayed with a dashed outline. Function names are suffixed with '()' (open/close parenthesis).
  • Objects that do not exist (either by accident or design) are displayed with a dotted outline.

You can configure database-specific text colours for SQL objects if you wish.

  • A pair of objects connected by a solid arrow indicates that the object at the end of the arrow is directly dependent on that at the arrow's start.
  • An object at the end of a dashed arrow is indirectly dependent on that at the arrow's start; i.e. the dependency occurs via one or more intermediate objects not present in the subgraph.
  • Clicking on a node in the diagram opens the file containing the diagram with that node at its centre. Clicking on a series of nodes enables you quickly to pass through a series of diagrams, easily traversing ETL pipelines from end to end.
  • Hovering over a node displays a tooltip indicating its full three-part name, its type and ID, and where applicable its average execution duration.

The notion of an object's neighbourhood is underpinned by the concept of a subgraph's radius:

  • A subgraph of radius 1 around a process/resource contains the object itself and anything which is directly dependent on it, or on which it depends directly. The smallest possible neighbourhood for a process or resource has radius = 1.
  • sprockitviz tests greater subgraph radii to find the largest neighbourhood for an object that can be contained within maxWidth and maxHeight (as specified in sprockitviz.exe.config).
  • You can fix neighbourhood size by specifying a greater-than-zero value of subgraphRadius in sprockitviz.exe.config. This has the effect of accelerating sprockitviz execution, because other radii are no longer tested when building object neighbourhoods.

Note that the only complete set of dependency information present in a subgraph is that relating to the central (gold-filled) node. Other nodes in a subgraph may show multiple dependencies but these are not necessarily complete. To gain an accurate complete view of dependencies around an object you must consult the subgraph with that object at its centre.

You're pretty much there, but you could:

  • experiment with the values of minWidth, minHeight and subgraphRadius in sprockit.exe.config to see what gives you the best set of outputs;
  • add a <dbColors> node to your Sprockit <Instance> to show different databases' objects in different text colours.

Graphviz is free, open-source graph visualisation software.