Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
speaking [2019/04/07 16:14]
admin ↷ Links adapted because of a move operation
speaking [2020/02/23 15:58] (current)
admin
Line 1: Line 1:
 ====== Speaking ====== ====== Speaking ======
 +
 +===== Automated testing for ADF pipelines =====
 +
 +Test coverage for data engineering developments often isn't high, and Azure Data Factory (ADF) pipelines are no exception. In this talk I'll apply well-established testing approaches to an ADF pipeline using C# and NUnit, integrating the test suite into an Azure DevOps pipeline for regular automatic execution. The result will be an ADF instance that is re-tested in full whenever any change is made -- quickly flushing out errors and breaking changes, and giving you the opportunity to fix bugs as they occur (instead of at 2am in three months'​ time!).
 +
 +===== Test suite development for ADF pipelines =====
 +
 +Test coverage for data engineering developments often isn't high, and Azure Data Factory (ADF) pipelines are no exception. In this talk I begin by presenting a basic C#/NUnit test setup for a simple ADF pipeline. I'll move on to talk about patterns for flexible test setup, isolation using dependency injection and faked external dependencies,​ unit vs functional tests in ADF and calculation of test coverage. My aim here is to demonstrate that test construction needn'​t be hard or onerous, and that developing automatable tests alongside your ADF pipelines provides real benefits in terms of prompt bug discovery and regression prevention.
 +
  
 ===== Design patterns for metadata-first ETL process control ===== ===== Design patterns for metadata-first ETL process control =====
Line 9: Line 18:
   * [[https://​www.meetup.com/​SQL-South-West/​|SQL South West]] -- January 28th 2019   * [[https://​www.meetup.com/​SQL-South-West/​|SQL South West]] -- January 28th 2019
  
-===== My mother said I never should: Unpacking SQL code smells ​===== +===== Understand your ETL pipeline with graph data visualisations ​===== 
- +Documentation has never been this much fun! In this session ​I'​ll ​be introducing Graphviz – free, open-sourcegraph visualisation software with relevance ​that extends beyond traditional graph applicationsI will show how we can use it to build informative visualisations of common data management artefactsspecifically SQL Server database diagrams ​and ETL data pipelinesCombining ​the approach with sources ​of metadata we'​ll ​see how we can quickly ​and automatically generate suites of interlinked diagrams ​to describe large and complex database ​and ETL systems in an easy-to-navigate ​way. 
-In this talk, argue that SQL code smells are shorthand for coding practices that may be bad a lot of the time -- but perhaps not always. We'​ll ​look at some commonly-cited code smells to try to work out what underlying behaviour we're really trying to avoidthen look for situations where we might actually want that behaviour anywayUnderstanding the rules is the first step to knowing when it's safe to break them, and the aim of this session is to ask enough questions to make us confident -- but prudent! -- rulebreakers. +
- +
-===== SSIS script tasks and components for beginners ===== +
- +
-SSIS script tasks and components allow you to add functionality to your packages that isn't supported out-of-the-box. In this session I will cover the basics ​of task and component implementation,​ including a basic pattern for error handling to make debugging easier. I'​ll ​walk through some simple implementations, ​and talk about when you might want to write your own tasks, when you might consider third-party implementations, ​and when you might want to go all-out ​and write your own custom task or component. Some understanding of C# (or other strongly-typed object-oriented language) will go a long way, but even if you're an absolute beginner there should be something here for you.+
  
 ===== Supercharge your ETL development with Dynamic T-SQL ===== ===== Supercharge your ETL development with Dynamic T-SQL =====
 ETL development can be packed with variety or as repetitive as WHILE 1 = 1 – and when it's the latter it's time-consuming,​ boring and error-prone. In this session I'll get the ball rolling with some basic dynamic T-SQL before supercharging it with metadata to generate (and re-generate) a variety of ETL components in T-SQL. We'll wrap up with some thoughts about how to tackle this in the real world with a heady mixture of good practice and metadata abstraction. ETL development can be packed with variety or as repetitive as WHILE 1 = 1 – and when it's the latter it's time-consuming,​ boring and error-prone. In this session I'll get the ball rolling with some basic dynamic T-SQL before supercharging it with metadata to generate (and re-generate) a variety of ETL components in T-SQL. We'll wrap up with some thoughts about how to tackle this in the real world with a heady mixture of good practice and metadata abstraction.
- 
-===== Understand your ETL pipeline with graph data visualisations ===== 
-Documentation has never been this much fun! In this session I'll be introducing Graphviz – free, open-source,​ graph visualisation software with relevance that extends beyond traditional graph applications. I will show how we can use it to build informative visualisations of common data management artefacts, specifically SQL Server database diagrams and ETL pipelines. Combining the approach with sources of metadata we'll see how we can quickly and automatically generate suites of interlinked diagrams to describe large and complex database and ETL systems in an easy-to-navigate way.  
  
 ===== ETL process management with TSQL ===== ===== ETL process management with TSQL =====