Access Google Analytics with Azure Data Factory

At the time of writing, Azure Data Factory has no connector to enable data extraction from Google Analytics, but it seems to be a common requirement – it has 594 votes on ADF's suggestions page, making it the sixth most popular idea there.

With a bit of help (e.g. from an Azure Function), it is possible to implement Google Analytics extracts using ADF's current feature set. To achieve this we need to do three things:

  1. Get authorised – Google APIs use OAuth 2.0 for authorisation, so a pipeline needs to be able to obtain an OAuth token

  2. Call the reports:batchGet API and ingest the response – you can do this using ADF's Copy data activity, but I'll give you some JSON to speed this up

  3. Handle pagination – the API may not return all the data for your extract in one go, so the pipeline needs to be able to make repeated API calls to get all the pieces.

In this post I'm looking at the first of these requirements. Let's get authorised!

You will need:

  • an Azure Key Vault
  • an Azure Function App
  • an instance of Azure Data Factory.

If you don't already have one, you'll need a Google API service account – this provides an identity you can use to connect to Google APIs. You can create one in the Google Developer Console.

  1. A Google APIs project is a container for credentials and other API-related resources. Choose a project from the Select a project dropdown in the top left. If you don't already have one, you can create one from the drop down or using the button on the right of the dashboard.

  2. Ensure that the Google Analytics Reporting API is enabled for the project. Search for it in the API library (available from the Library link in the sidebar) – select it, then click ENABLE to enable it.

  3. Create a service account using the Credentials link in the sidebar. On the Credentials page, click + CREATE CREDENTIALS, choose Service account and follow the instructions.

  4. The new service account appears in the list of project credentials – select it. The Service account details page includes a Keys section – add a new key of type JSON. This automatically downloads a private key JSON file to your computer.

  5. Create a new secret in your Azure Key Vault – mine is called “GoogleServiceAccountKeyJson”. Set it to the contents of the downloaded JSON key file – open the file in a text editor, select all, copy and paste it into the secret's Value field.

Make a note of the service account's email address – you'll need it in a minute.

You'll need to grant the service account access to the Google Analytics view(s) you want to extract data from.

  1. On the analytics admin menu, choose the view you're interested in.

  2. Use View user management to add a new user – use the service account's email address and give it Read & Analyse permission.

  3. While you're here, take a look at View settings and make a note of the view's View ID value – you'll need it later.

  4. Repeat for any other view(s) you're interested in.

Google API calls are authorized using OAuth 2.0. OAuth access tokens can be requested directly from the Google OAuth 2.0 REST endpoint, but this isn't convenient in Azure Data Factory for a number of reasons:

  • Google's OAuth 2.0 API uses a URL-encoded form request body, but ADF's Web activity only supports JSON.
  • The access token request must be encapsulated in a JSON Web Token (JWT), but the signing functionality required to produce one is not available in ADF.

I've implemented OAuth token acquisition using a C# Azure Function (below). This avoids the problems above, and allows me to use Google's .NET client API – this is recommended practice (and is a lot simpler than trying to roll your own JWTs).

  1. using Azure.Identity;
  2. using Azure.Security.KeyVault.Secrets;
  3. using Google.Apis.Auth.OAuth2;
  4. using Microsoft.AspNetCore.Http;
  5. using Microsoft.AspNetCore.Mvc;
  6. using Microsoft.Azure.WebJobs;
  7. using Microsoft.Azure.WebJobs.Extensions.Http;
  8. using System;
  9. using System.Threading.Tasks;
  11. namespace GoogleAnalytics
  12. {
  13. public class GetOAuthToken
  14. {
  15. [FunctionName("GetOAuthToken")]
  16. public async Task<IActionResult> Run(
  17. [HttpTrigger(AuthorizationLevel.Function, "get", Route = null)] HttpRequest req)
  18. {
  19. var kvClient = new SecretClient(new Uri(Environment.GetEnvironmentVariable("KEY_VAULT_URL")), new ManagedIdentityCredential());
  20. string keyJson = kvClient.GetSecret("KEY_VAULT_URL").Value.Value;
  22. var cred = GoogleCredential.FromJson(keyJson).CreateScoped(new string[] { "" });
  23. var token = await cred.UnderlyingCredential.GetAccessTokenForRequestAsync();
  25. return new OkObjectResult("{\"token\":\"" + token + "\"}");
  26. }
  27. }
  28. }

A call to this function:

  • obtains a key vault connection using the function app's managed system identity (MSI) [line 19]

    • my implementation stores the key vault URL in a function app application setting called KEY_VAULT_URL
    • for MSI access to work, you must enable your function app's MSI and grant it Get Secret access in your key vault

  • obtains the value of the “GoogleServiceAccountKeyJson” secret (the contents of the Google service account's private key JSON file) [line 20]

  • creates a credential using the service account's private key, scoped to (this scope is required to access the analytics API endpoint) [line 22]

  • requests an access token from Google's OAuth 2.0 service [line 23] then returns it in a JSON response that looks like this:

    { "token": "<returned_access_token>" }

This is something we can use in ADF =).

You can call a deployed Azure function from an ADF pipeline using the Azure Function activity. The screenshot shows a pipeline containing the activity, configured to make a GET request to the GetOAuthToken function – the function name and HTTP method match those on lines 15 and 17 of the C# code.

The pipeline's linked service – “MyAzureFunctionApp” – is a connection to the function app hosting the GetOAuthToken function.

When I run this pipeline, I can see the returned token value in the activity's output:

Tokens are fairly short-lived, but as they are security credentials it's good practice to prevent them from appearing in the activity's output. I'll do this by ticking the Secure output checkbox on the activity's General configuration tab – I haven't done it yet so that you can see the JSON path to the token value.

I'll be able to reference the OAuth token in downstream activities using the ADF expression @activity('Get OAuth token').output.token.

In this post I showed you how to create a Google API service account, grant it access to Google Analytics, and use its private key to obtain an OAuth token in ADF. In the next post I use the OAuth token to connect to the API and start extracting data.

  • Code: The code for this post is available on Github – the function app project is in folder vs, and the adf folder contains the definition files for the pipeline above.

  • Share: If you found this article useful, please share it!

F K N​ R W