sprockit.Process

[ v1.3 ] [ v1.4 ] [ v1.5 ] [ Back to Sprockit home page ]

Each row in this table represents a process managed by Sprockit. A process is either:

  • something that is executed (a stored procedure or an SSIS package), or;
  • a stored query (a view or a user-defined function).

To have Sprockit run a stored procedure or package, you must make an entry for it in this table; if its input resources include views or user-defined functions, you should also make an entry for each such stored query.

Configuring views and functions in this table ensures that dependent processes don't attempt to use them until their underlying base tables are ready.

Approaches to process configuration are described in Configuring processes in Sprockit.

Column name1) Column type Description Notes
PK ProcessId INT Unique identifier for the process. IDENTITY
DbName NVARCHAR(128) The name of the database in which the process object is found. For SSIS packages, this is the name of the catalog folder.
SchemaName NVARCHAR(128) The name of the database schema in which the process object is found. For SSIS packages, this is the name of the SSIS project.
ProcessName NVARCHAR(128) The schema-scoped name of the process object. For SSIS packages, this is the name of the SSIS package, including the .dtsx extension.
FqProcessName NVARCHAR(392) The fully-qualified (three-part) name of the process. Computed. Combines [DbName], [SchemaName], [ProcessName]
RL1 LatestExecutionId INT Identifies the most recent execution of this process. Updated by Sprockit.
Status VARCHAR(20) The current status of the process. Updated by Sprockit. A process cycles through a series of status values during an ETL run. Initially it is Not ready, then when all its inputs are available it becomes Ready. When selected by a handler it is briefly Reserved before it starts Running. In execution it may complete normally (Done) or fail (Errored); in the latter case any downstream processes not yet ready become Blocked.
LastStatusUpdate DATETIME The date and time at which the process's status was last changed. Updated by Sprockit.
AvgDuration INT The average duration of the process over recent successful executions. Updated by Sprockit. The number of executions included in the rolling average is controlled by Sprockit's RollingAverageBatchCount and RollingAverageBatchDays properties.
BranchWeight INT A process's branch weight is the average time elapsed between when the process starts and the last of its downstream processes is complete. 2) Updated by Sprockit. [BranchWeight] is used by a process handler when choosing between multiple ready processes.
IsEnabled BIT Indicates whether this process is to be executed or not. When a handler selects a disabled process for execution (a process having [IsEnabled] = 0), it immediately updates the process's status to 'Done'. Set the field's value to 1 if the process is to be executed, otherwise 0.
Priority TINYINT The process's priority. [Priority] is used by a process handler when choosing between multiple Ready processes. By default a process has a priority of 100 but this can optionally be increased (value < 100) or decreased (value > 100). Note that allocating different priority values to processes overrides [BranchWeight] selection and may cause batch end-to-end processing time to increase.
LogLevel TINYINT The process's logging level. Since Sprockit v1.5. Processes can record execution events in sprockit.Event using utility SP usp_LogExecutionEvent. The utility SP only makes entries in the event log for processes having [LogLevel] >= 1. To disable custom event logging actions defined in a stored procedure or SSIS package, set its [LogLevel] to zero in this table.

Relationships with integrity enforced by a foreign key constraint have identifiers prefixed with FK; others are prefixed RL.

Click here to view the data model diagram.

References to other tables

Attributes
(in this table)
Referenced table Referenced attributes
(in referenced table)
RL1 LatestExecutionId sprockit.Execution ExecutionId

References to this table from others

Relationship Attributes
(in referencing table)
Referenced attributes
(in this table)
sprockit.Execution.RL1 ProcessId ProcessId
sprockit.Reservation.RL1 ProcessId ProcessId
sprockit.Resource.FK1 ProcessId ProcessId
sprockit.uvw_Process.RL1 ProcessId ProcessId
sprockit.uvw_ProcessDependency.RL1 PredecessorId ProcessId
sprockit.uvw_ProcessDependency.RL2 SuccessorId ProcessId

Selecting processes for execution

Process handlers select processes for execution using SP sprockit.usp_ReserveProcess – this SP identifies a Ready process from this table for the handler to execute.

When multiple Ready processes are available, the reservation SP chooses between them as follows:

  • Processes with lower [Priority] values are selected in preference to those with higher [Priority] values. This allows you to override [BranchWeight] if you wish to do so, but may cause end-to-end ETL processing time to increase.
  • Processes with higher [BranchWeight] values are selected in preference to those with lower [BranchWeight] values.

A process's branch weight is the average time elapsed between when the process starts and the last of its downstream processes is complete.3) The dependency graph below is annotated with [AvgDuration] (grey) and [BranchWeight] (black) values – you can see that (for example) process C's [AvgDuration] is 7 seconds, but its branch weight is 27 = 7(C) + 20 (H):

Branch weight provides an indication of how much work is behind each process, and preferring processes with high [BranchWeight] values allows handlers to get started on that work as quickly as possible. Overall, this has the effect of reducing the end-to-end processing time for an ETL batch.

[BranchWeight] is recalculated at the start of each batch using fresh [AvgDuration] values. Because [AvgDuration] is a rolling average, process selection adapts to changes in the ETL landscape as processes are added, removed, or their execution time changes. [AvgDuration] is automatically set to 0 for processes having [IsEnabled] = 0, ensuring that the influence of a process's duration on other [BranchWeight] values is immediately removed when the process is disabled.

Branch weight defined

A more precise definition of branch weight is as follows:

  • A path is a sequence of processes in which each member is directly dependent on its predecessor. A path starts at any process but ends only at a process with no dependants.
  • A path's weight is the sum of the average duration of each process in the path having [IsEnabled] = 1.
  • A process's branch weight is the highest path weight over all paths starting at that process.

In the example above, two paths originate at C:

  • {C → G → J}, path weight = 7 + 4 + 4 = 15 (assuming all processes are enabled);
  • {C → H}, path weight 7 + 20 = 27.

The highest path weight over all paths starting at C is that of {C → H} = 27. This is C's branch weight.


Page generated from database metadata by DbScout.


1)
Columns with names in bold are non-nullable; columns with underlined names are participants in the table's primary key.
2) , 3)
A more precise definition is available below.