Working with multiple Power BI dataset environments

Power BI Azure Pipelines This is Part 5 of a series about creating a professional developer experience for working with Power BI. If you haven't already seen the first, you may prefer to start there.

In earlier posts in this series, I talked about to developing and deploying standalone Power BI datasets and automating report deployment into different environments. I'll bring together those approaches in this post, to enable deployment of shared datasets into multiple environments. This has consequences for automated report deployment, and I'll take a look at that too.

The workflow I presented for Power BI reports uses three environments:

  • Test – where reports are deployed for review by developer colleagues
  • UAT – where reports are deployed for business stakeholders to review, after inclusion in a release candidate
  • Production – where developed reports are in day-to-day business use.

Testing a dataset is a technical process, requiring a good understanding of tabular model design and DAX implementation. For that reason, I don't expect business stakeholders to review a standalone dataset directly – I'm going to implement a simpler workflow that has no UAT environment.

As in the case of report development, changes to datasets will be made in a feature branch. The workflow is similar:

  1. Open a feature branch.
  2. Make changes.
  3. Open a pull request to merge the feature branch into the current release candidate (rc/ branch) – at this point the dataset will be deployed to Test.
  4. Merge the feature branch into the release candidate, then wait for the scheduled production release.
  5. On the agreed schedule, merge the release candidate into main, causing the changes to be deployed into Production.

More detail about the report development workflow is available in the article Publish automatically to Power BI environments with Azure DevOps pipelines from earlier in the series.

This specific workflow might not work for you! Workflow design choices must be made in the context of your own business and technical environment – you need to find a process that delivers the developer experience, governance controls and business support that you need.

Here's the new deployment pipeline for the AdventureWorksLT dataset:

  1. trigger:
  2.   branches:
  3.   include:
  4. - main
  5.   paths:
  6.   include:
  7. - /powerbi-pro-devex-series/05-DatasetEnvironments/datasets/AdventureWorksLT
  8. pr:
  9.   branches:
  10.   include:
  11. - rc/*
  12.   paths:
  13.   include:
  14. - /powerbi-pro-devex-series/05-DatasetEnvironments/datasets/AdventureWorksLT
  15.  
  16. stages:
  17. - template: ../../tools/dataset-pipeline-stages.yaml
  18.   parameters:
  19.   datasetFolder: /powerbi-pro-devex-series/05-DatasetEnvironments/datasets/AdventureWorksLT

You'll notice two things:

  • Unlike a report deployment pipeline, the trigger section only specifies the main branch (line 4) – this is because no UAT deployment is required when merging a PR into a release candidate.

  • The bulk of the pipeline's activity is defined separately, in the dataset-pipeline-stages.yaml template (referenced on line 17). This makes it easier to support multiple datasets, each with its own deployment pipeline 😃.

You won't be surprised to learn that – as I'm suggesting multiple datasets, each having their own pipeline – I'm going to propose a folder structure similar to the one I used to organise reports:

Each dataset has a folder of its own, under a common “datasets” folder, and contains three files:

  • Model.bim, the dataset TMSL definition file
  • pipeline.yaml, the model's deployment pipeline definition (shown above)
  • metadata.yaml, metadata used to configure model deployment features.

I'll look at TMDL model deployments in a later post.

Dataset deployment metadata is stored in file metadata.yaml. The file for the AdvantureWorksLT model looks like this:

variables:
  datasetName: AdventureWorksLT
  description: |
    Sales data for the AdventureWorks company.

It's similar to the corresponding file I used to assist report deployment – it's a YAML variable template that can be included by the deployment pipeline at runtime. It specifies the name to be used by the dataset when deployed, but no target workspace – in my implementation, every shared dataset is deployed to the same “Datasets” workspace.

Finally, here's the pipeline template, dataset-pipeline-stages.yaml. You'll recognise the pipeline parameter (line 1-3), use of shared secrets and dataset metadata (lines 7-9), and parameterised scipt call (lines 24-38) from the equivalent report template. The targetWorkspace variable definition (lines 17-21) is simpler because the pipeline deploys to test and production workspaces only.

  1. parameters:
  2. - name: datasetFolder
  3.   type: string
  4.  
  5. stages:
  6. - stage: Jobs
  7.   variables:
  8.   - group: DeploymentSecrets
  9.   - template: ${{ parameters.datasetFolder }}/metadata.yaml
  10.   jobs:
  11.   - job: Job
  12.   displayName: Publish Power BI dataset
  13.   pool:
  14.   vmImage: windows-latest
  15.   variables:
  16.   - name: targetWorkspace
  17.   ${{ if eq(variables['Build.SourceBranch'], 'refs/heads/main') }}:
  18.   value: 'Datasets'
  19.   ${{ elseif eq(variables['Build.Reason'], 'PullRequest') }}:
  20.   value: 'Datasets [Test]'
  21.   steps:
  22.   - task: PowerShell@2
  23.   displayName: Publish dataset
  24.   inputs:
  25.   targetType: filePath
  26.   filePath: "$(System.DefaultWorkingDirectory)/${{ parameters.datasetFolder }}/../../tools/Deploy-PbiDataset.ps1"
  27.   arguments: >
  28. -DatasetName "$(datasetName)"
  29.   -WorkspaceName "$(targetWorkspace)"
  30.   -BimFilePath "$(System.DefaultWorkingDirectory)/${{ parameters.datasetFolder }}/Model.bim"
  31.   -RefreshDataset $true
  32.   failOnStderr: true
  33.   env:
  34.   AZURE_TENANT_ID: $(AzureTenantId)
  35.   AZURE_CLIENT_ID: $(AzureClientId)
  36.   AZURE_CLIENT_SECRET: $(AzureClientSecret)

In earlier posts in this series, I considered only one dataset environment, used by all reports. I implemented deployment through three report environments, but every report was bound to the same dataset, in the production “Datasets” workspace:

With the pipeline introduced in this post, we now have an additional dataset environment, “Datasets [Test]”, which poses a question: Do we ever want to be able to inspect reports linked to a test dataset? For example, it might be useful for developers to be able to see the effect of a revised measure visually, perhaps in the test reports workspace:

Linking – or binding – reports to different datasets when deployed in different workspaces requires a change to the report deployment pipeline.

I've been handling the detail of report deployment in a PowerShell script, called from the report deployment pipeline – modifying report bindings will take place in the script, immediately after a report is published.

This is an excerpt of the modified PowerShell script (included in the code files that accompany this post):

  1. # publish the report
  2. Write-Host "Deploying '$ReportName' to workspace '$WorkspaceName'"
  3. Write-Host $PbixFilePath
  4. Write-Host "IsDeleted = $IsDeleted"
  5.  
  6. if($IsDeleted) {
  7. Unpublish-PbiReport `
  8. -Name $ReportName `
  9. -WorkspaceName $WorkspaceName
  10. } else {
  11. Publish-PbiReport `
  12. -Path $PbixFilePath `
  13. -Name $ReportName `
  14. -WorkspaceName $WorkspaceName
  15.  
  16. Set-PbiReportDataset `
  17. -ReportName $ReportName `
  18. -ReportWorkspaceName $WorkspaceName `
  19. -DatasetName $DatasetName `
  20. -DatasetWorkspaceName $DatasetWorkspaceName
  21. }

Lines 50-54 call a new function, Set-PbiReportDataset, identifying the name and workspace of the dataset to which the report is to be bound. The dataset workspace is determined by the pipeline: Test reports are bound to test datasets, UAT and production reports to production datasets. The dataset name is a new piece of report metadata which must be stored in each report's metadata.yaml file, e.g.:

variables:
  reportName: 'Executive Summary'
  description: |
    A summary of company sales performance for executives.
  workspaceName: 'AdventureWorks Reports'
  sourceDataset: AdventureWorksLT
  isDeleted: false

The definition of Set-PbiReportDataset is added to the PbiDeployment.psm1 PowerShell module. The function identifies the named dataset in the indicated workspace, then calls the Rebind Report in Group API endpoint to rebind the report.

  1. # Binds a Power BI report to a dataset
  2. function Set-PbiReportDataset([string]$ReportName, [string]$ReportWorkspaceName, [string]$DatasetName, [string]$DatasetWorkspaceName) {
  3. Write-Host "Binding '$ReportName' in workspace '$ReportWorkspaceName'"
  4. Write-Host "to dataset '$DatasetName' in workspace '$DatasetWorkspaceName'"
  5.  
  6. $workspaceId = Get-PbiWorkspaceId -Name $DatasetWorkspaceName
  7. $datasetId = Get-PbiDatasetId -Name $DatasetName -WorkspaceId $workspaceId
  8. $workspaceId = Get-PbiWorkspaceId -Name $ReportWorkspaceName
  9. $reportId = Get-PbiReportId -Name $ReportName -WorkspaceId $workspaceId
  10.  
  11. $body = @{datasetId = $datasetId} | ConvertTo-Json -Compress;
  12. $url = "https://api.powerbi.com/v1.0/myorg/groups/$workspaceId/reports/$reportId/Rebind"
  13. Invoke-PowerBIRestMethod -Url $url -Method POST -Body $body | Out-Null
  14. }

This video shows the extended report pipeline in use, demonstrating that the report is bound to different datasets depending on the workspace into which it is deployed:


I've chosen to bind test reports to test datasets because it's convenient, but you might choose differently. The original purpose of reports' [Test] workspaces was to allow reports to be reviewed internally before UAT, but that might be less meaningful if performed using a different dataset. You could choose to add another report environment simply to allow test datasets to evaluated in the context of reports.

There is no single right answer here 😊. You need to consider:

  • What environments do you need for reports? (Production, UAT, test, dataset evaluation, others?)
  • What environments do you needs for datasets? (The same as for reports? Just production & test? Something else?)
  • How do report and dataset environments interact? (If you use the same set of environments for both, do you want reports in each environment to use the corresponding dataset environment, or something different? If you use a different set of environments, what then?)

The right answer for your organisation is closely linked to the business processes and development workflow you want to be able to support.

A question that sometimes arises when managing datasets separately from reports is how to manage the dependency between the two.

If a dataset is upgraded to support a new report feature, you might argue that you we need to deploy dataset changes first, ahead of report changes. But when a dataset upgrade includes a breaking change, no deployment order exists to allow the report to remain available – the changed dataset will break the existing report, but the updated report won't work without the changed dataset. Ultimately, this means that some report downtime is inevitable.

The simplest approach is to agree with your customers a window in which scheduled deployments can take place. At that time – using the development workflow I've described in this series of posts – the current release candidate branch is merged into main, and the deployment pipelines of every updated report and dataset are triggered. During pipeline execution, the environment is in a state of flux and may not be usable. After all pipelines have completed, and any required dataset refreshes have been performed, the production environment will once again be in a self-consistent state and ready for use.

In this article, I presented a workflow for managing Power BI datasets in multiple environments, and refactored an earlier deployment pipeline to support that. Maintaining multiple dataset environments provides a choice of datasets for published reports, implemented by rebinding a report during its deployment.

The workflow I've described here may not be what you're looking for, and that's fine! With this approach you can support arbitrary workflows in both dataset and report development. 😎

  • Next up: In the previous post, I described a pattern for deploying reports based on reusable pipeline components. In the next post, I show how to script that process, making it easy to set up new reports and to create their deployment pipelines automatically 🚀.

  • Code: The code for the series is available on Github. The files specific to this article are in the powerbi-pro-devex-series/05-DatasetEnvironments folder.

  • Share: If you found this article useful, please share it!