Adding row numbers to SSIS data flows

Sometimes it's useful to add a row number to rows passing along an SSIS data flow e.g.

  • if the source data has no unique identifier and you want to be able to identify rows that fail
  • if you want to take action periodically as the data flow executes (e.g. write a progress message to the event log after every 1000 rows)

This article describes a script component pattern for doing so.

  1. Set up the source transformation for your data flow (e.g. an OLD DB source, a flat file etc)
  2. Drag a script component onto the data flow surface, setting its type to 'Transformation'.
  3. Connect the output of the source transformation to the input of the script component. (It makes sense to add the row number to the data flow immediately, because then it is available throughout all subsequent transformations).
  4. Double-click on the script component to open its editor, then revise it as follows:
    • On the Inputs and Outputs page, drill down into the properties of the components output ('Output 0') and click on 'Output Columns'
      • Click the 'Add column' button and add a column with a name like RowNumber and a DataType of DT_I4 (this is the default)
    • On the Script page, click 'Edit Script'.
  5. The default script that opens in VSTA includes the line public class ScriptMain : UserComponent. This is the declaration of the ScriptMain class.
    • Select the code for this pattern (below – you can select the whole block by double-clicking on it somewhere) and copy it to the clipboard
    • Paste it into the script, replacing the existing class declaration and everything after it in the script file.
  6. Save, close, etc. Now you can continue your data flow from the output of the script component.

The pattern uses an internal rowNumber variable to count how many rows have passed through the transformation.

  • The value is initialised at 0 in the transformation's PreExecute() method.
  • The value is incremented whenever a row passes through the transformation (when Input0_ProcessInputRow is called). The new value represents the number of rows that have now passed through the transformation – i.e. the number of the current row. This value is written into the row's RowNumber column.
public class ScriptMain : UserComponent
{
  private int rowNumber;
 
  public override void PreExecute()
  {
    base.PreExecute();
    rowNumber = 0;
  }
 
  public override void Input0_ProcessInputRow(Input0Buffer Row)
  {
    rowNumber++;
    Row.RowNumber = rowNumber;
  }
}