Raw data reporting: Preparing GA4 data with BigQuery and Dataform

GA4 reporting based on raw data makes marketing reporting much more accurate and flexible to model. In this tutorial, we show you how to prepare GA4 data with Dataform in Google BigQuery.

Advantages of GA4 raw data reporting with BigQuery

One of the biggest advantages of GA4 is the ability to mirror all data daily in Google BigQuery. This gives you access to raw data that can be linked to other marketing and sales data. It also avoids the annoying quota problem, as Google limits access to GA4 even with expensive GA360 accounts, so that with many data retrievals only a quota error message is displayed instead of the report. Google therefore recommends working directly with raw data.

Advantages at a glance:

  • Data sovereignty: You own the GA4 tracking data and are therefore more independent of the tracking tool provider
  • Enable data linking: GA4 raw data can be linked with other data (e.g. sales data from Salesforce, Oracle, Hubspot)
  • Model data yourself: Those who need their own KPIs, metrics and dimensions can define their own data
  • Customize data: You can change data retrospectively via SQL, e.g. to fill in tracking failures with mean values or from other systems and thus keep reporting accurate
  • Solve the quota problem: Raw data reporting avoids quota warnings
  • AI usage: raw data can be analyzed directly with AI models In the Google Cloud Platform, VertexAI provides numerous models for forecasting, clustering, condition models and high-end LLMs such as Gemini, Claude and others.

Result of this tutorial: Raw data report

This is what the finished raw data report with GA4 data from BigQuery will look like.

Ergebnis: Looker-Studio Report mit transformierten GA4-Rohdaten aus BigQuery
Result: Looker Studio report with transformed GA4 raw data from BigQuery

 

Problem: GA4 data in BigQuery must be prepared for reporting

If you have exported your GA4 data in BigQuery, you face the challenge that you cannot simply report the metrics and dimensions as they are. Many dimensions and metrics are available in raw data format and must first be defined and created, e.g. sessions, users, channels and many more.

The best free AI tools

The best free AI tools
View free AI Tools

Google’s raw data format is beautifully compact and efficiently modeled (tabular data format with its own data table per tag, “nested” data fields, i.e. fields within a data field). This has the advantage that massive amounts of data can be queried quickly in BigQuery. There is also a Looker Studio connector for this. However, every retrieval incurs costs in BigQuery. Especially large company websites with many calls have to plan for additional costs for reporting, because in BigQuery you don’t just pay for retrieved data fields (e.g. SELECT pageviews FROM events) but for all data to be evaluated, i.e. all data fields of the tables.

GA4-Daten in BigQuery im Rohformat
GA4 data in BigQuery in raw format: Efficient but difficult to use for many SQL users

What is BigQuery Dataform?

Dataform is a framework for structured data transformations integrated into Google BigQuery. It allows you to transform your data with SQL, even if you need to implement numerous, complex and nested transformations, such as creating metrics and dimensions.

Advantages:

  • Transform, merge and model data in a structured way
  • free of charge

Google Dataform Documentation

Call Dataform in BigQuery

What is GA4 Dataform?

We particularly recommend using another framework called GA4Dataform from the team at Superform Labs (THANK YOU!!!). This is what the analytics and BI scene has been waiting for! The open source solution uses Google Dataform to implement typical GA4 transformations. The installer makes setup child’s play. Instead of weeks of development work in SQLs with many possible sources of error, you simply use the latest version of GA4Dataform. After installation, GA4Dataform delivers a fully defined, reportable GA session and GA4 event table with all the desired metrics and dimensions, which updates itself every day.

Advantages:

  • Save effort and costs during data transformation
  • Save ongoing reporting costs (incremental data retrieval, easy expandability)
  • free of charge, open source

GA4Dataform

Tutorial: Prepare GA4 raw data with Dataform in BigQuery and report in Looker Studio

Installation of GA4Dataform

As a prerequisite, you need a BigQuery export set up in GA4 and rights in BigQuery. After entering the email address you used for the BigQuery account, you will receive a free license with key. In the installer, log in to your Google account and start the installation.

Step 1: Start installation of GA4Dataform

Installation von GA4Dataform, Schritt 1
Installation of GA4Dataform, step 1

Step 2: Process historical data and set up a daily Dataform run

Installation von GA4Dataform, Schritt 2
Installation of GA4Dataform, step 2

Step 3: Done: All Dataform transformations are set up and the tables are created

Installation von GA4Dataform, Schritt 3
Installation of GA4Dataform, step 3

Result: Dataform pipeline set up for GA4 data

Thanks to GA4Dataform, the perfectly modeled and structured Dataform pipeline makes it easy to work with the data and add your own modeling, e.g. for your own metrics or data corrections (e.g. compensate for tracking failures via SQL, extrapolate data, add net margins from merchandise management systems and more).

Dataform uses a number of structuring options for this:

  • Github integration: All transformations are stored in a Github repo.
  • Definitions: You create your own SQL snippets that can be used in other SQL calls. The format used here is an extended sqlx format so that you can store config and SQL definition in the same file and use includes
  • Assertions: You can create data checks that check whether data is plausible, e.g. start/end date of a session, defective values (e.g. bounce rates over 100% etc.)
  • Incremental transformations: The data can be supplemented incrementally, i.e. only required data is supplemented, which saves costs and computing power.
BigQuery Dataform-SQL-Code für GA4-Event-Daten
BigQuery Dataform SQL code for GA4 event data

The pipeline can be clearly displayed as a graph so that you can see which transformations are based on which definitions

Die Dataform-Transformationen können als Graph angezeigt werden
The dataform transformations can be displayed as a graph

Result: Prepared raw data in BigQuery

Dataform now stores the transformed data in BigQuery on a daily basis. This includes

  • Session table
  • Event table
  • Transactions table

This data is available “flat” instead of nested as in the original Google format in BigQuery. This is much easier to use in reporting.

Die GA4-Sessions-Tabelle nach der Transformation durch Dataform
The GA4 sessions table after transformation by Dataform

Data of the events table as a schema

Practical: All event parameters are stored flat in individual fields. If you have maintained good tracking documentation, you can easily look up which events have which fields and use them to create a report.

Schema der neuen GA4-Events-Tabelle, nach Dataform-Transformation
Schema of the new GA4 events table, after dataform transformation

Data of the sessions table as schema

The sessions can finally be viewed easily again. Each session has all the fields required for the analysis, such as start/end timestamp, session duration, user ID, source by attribution model (first/last), device and many other fields.

Schema der neuen GA4-Sessions-Tabelle, nach Dataform-Transformation
Schema of the new GA4 sessions table, after dataform transformation

Result: GA4 raw data reporting in Looker Studio

The prepared GA4 data can be reported with Looker Studio, Power BI, QlikSense or other BI solutions.

We can now quickly create a report directly from BigQuery. To do this, simply select the corresponding menu item for “Looker Studio”.

Looker-Studio-Report per Klick direkt aus BigQuery erstellen
Create a Looker Studio report directly from BigQuery with one click

Result: Finished GA4 raw data report in Looker Studio

In Looker Studio, you can set up your report a little further and have now quickly created useful, filterable charts and tables based on raw data.

Ergebnis: Looker-Studio Report mit transformierten GA4-Rohdaten aus BigQuery
Result: Looker Studio report with transformed GA4 raw data from BigQuery

Ads

Legal Notice: This website ai-rockstars.com participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.

Conclusion: Professional GA4 raw data reporting with Dataform

Efficient GA4 raw data reporting is much easier than ever before thanks to the great open source solution GA4Dataform. Instead of weeks of SQL creation with numerous error possibilities in the metric definition, a report can now be created in a short time.

GA4 raw data can be used for advanced marketing analyses such as 360-degree reporting from marketing to sales, so that you can, for example, link your marketing campaigns with offline conversions (e.g. after sales calls) and thus understand which campaign really brought in conversions.