Migrating to Google Analytics A4 API from GAU API for data extraction

Google released its comprehensive update to Google analytics called Google Analytics 4 (GA4) in October 2020. Not since 2012 when google analytics first allowed cross platform tracking and provided ability to create custom metrics has there been a change which is so large in its scope to Google analytics. On a high level, this change is mainly centered on focusing on user events (like page views, scrolls, outbound links etc.) rather than focusing on page views and time spent. It also brings in more restrictions from user data perspective to ensure that fewer data points related to user behavior can be gathered. 

Another important update is on the API front. If you are using Google Analytics APIs to extract data to your data lakes from Google Analytics, then you will have to make significant changes to the query payload when you switch over to GA4 based API. Let’s have a quick look at what changes are needed for converting a typical GUA API to GA4 using Python.

The good news is that your old authentication JSON generated from Google developers account should work as it is. The main difference will come up due to following changes:

  1. A new python library is introduced to run new API, so you will need install new API library for google analytics data extract. Please find details of installing new library here.
  2. We no longer need to have a view created in our property on Google Analytics to extract data. In GUA APIs View ID was a mandatory parameter. That is no longer the case, you can run GA4 APIs by using Property ID, which can be found in Admin -> Property Settings:
  1. Names of the Attributes and measures which are to be extracted have been changed. The new measures and attributes have simpler names without ‘ga:’ prefix. Check out the full list here
  2. You can set an environment variable, which default constructor will use to get credentials. Unlike GUA APIs where the path of credential file was required to be specified in the code in some way. 
  3. Tags used for conditions and measures in the payload are different. Specifically, dateRange has become date_range, and startDate and endDate have become start_date and end_date. Also measure names must be specified with “name” tag, unlike in GUA where they can be specified with “expression” tag.

And that’s pretty much it. Let’s look at step by step process:

  1. As stated earlier, first install required GA4 library:
  1. Set the environment variable to point at the GA credential file:
    export GOOGLE_APPLICATION_CREDENTIALS=<yourpath>
  2. Change your GUA payload: I typically prefer having the measures and attributes which I need in a payload to be placed in separate payload file. My payload JSON In GUA and GA4 looks as follows:

Some of you might have noticed that some fields like ‘bounces’ are gone and ‘SessionDuration’ is replaced by ‘userEngagementDuration’  etc. this is part new features of GA4 where the measurements of some matrics has been fundamentally changed (userEngamentDuration now measures hwo long the window/app screen was active on visitors device, instead of measure session time spent on page in GUA. In my opinion, this gives more realistic insights) 

  1. The payload cal also be directly put in the API call. Following is the code which will give you the JSON response:
from google.analytics.data_v1beta import BetaAnalyticsDataClient

def sample_run_report():
   client = BetaAnalyticsDataClient()
   response = client.run_report({
            "property": "properties/<view_id>",
            "date_ranges": [{
                  "start_date": "2021-04-01",
                  "end_date": "2021-04-07"
            }],
            "metrics": [{
                  "name": "sessions"
            },{
                  "name": "newUsers"
            },{
                  "name": "totalUsers"
            },{
                  "name": "screenPageViews"
            },{
                  "name": "userEngagementDuration"
            },{
                  "name": "transactions"
            }],
            "dimensions": [{
                  "name": "year"
            },{
                  "name": "month"
            },{
                  "name": "sessionSource"
            }]
      })
   return response

resp=sample_run_report()
print("Report result:")
for row in resp.rows:
    print(row.dimension_values[0].value, row.metric_values[0].value)

With this code you should be able to extract data from GA4 instance and store it on your data lake. 

Leave a Reply

Recent Posts