Speaking

Download slides

As data engineers we're great at putting together and managing complex process flows, but what happens if we stop trying to control the flow and start thinking about the metadata it needs instead? In this session we'll look at a variety of ETL metadata, how we can use it to drive process execution, and see benefits quickly emerge. I'll talk about design principles for metadata-first process control and show how using this approach reduces complexity, enhances resilience and allows a suite of ETL processes adaptively to reorganise itself.

In this talk, I argue that SQL code smells are shorthand for coding practices that may be bad a lot of the time – but perhaps not always. We'll look at some commonly-cited code smells to try to work out what underlying behaviour we're really trying to avoid, then look for situations where we might actually want that behaviour anyway. Understanding the rules is the first step to knowing when it's safe to break them, and the aim of this session is to ask enough questions to make us confident – but prudent! – rulebreakers.

SSIS script tasks and components allow you to add functionality to your packages that isn't supported out-of-the-box. In this session I will cover the basics of task and component implementation, including a basic pattern for error handling to make debugging easier. I'll walk through some simple implementations, and talk about when you might want to write your own tasks, when you might consider third-party implementations, and when you might want to go all-out and write your own custom task or component. Some understanding of C# (or other strongly-typed object-oriented language) will go a long way, but even if you're an absolute beginner there should be something here for you.

ETL development can be packed with variety or as repetitive as WHILE 1 = 1 – and when it's the latter it's time-consuming, boring and error-prone. In this session I'll get the ball rolling with some basic dynamic T-SQL before supercharging it with metadata to generate (and re-generate) a variety of ETL components in T-SQL. We'll wrap up with some thoughts about how to tackle this in the real world with a heady mixture of good practice and metadata abstraction.

Documentation has never been this much fun! In this session I'll be introducing Graphviz – free, open-source, graph visualisation software with relevance that extends beyond traditional graph applications. I will show how we can use it to build informative visualisations of common data management artefacts, specifically SQL Server database diagrams and ETL pipelines. Combining the approach with sources of metadata we'll see how we can quickly and automatically generate suites of interlinked diagrams to describe large and complex database and ETL systems in an easy-to-navigate way.

Download slides

There are many techniques for orchestrating ETL processes, but the difference between good ones and great ones is how they perform when things go wrong. Desirable behaviours – like fault tolerance, quick fault finding and easy resume after error – often aren't available and sometimes seem hard to achieve. In my session I'll present an approach to doing this using only TSQL and the SQL Server Agent, and which also enables parallel processing, adapts to evolving workloads and provides a wide variety of monitoring and diagnostic information.