Are you leveraging the robust capabilities of BigQuery to query your Google Analytics 4 event data?
Understanding this integration can significantly enhance how you visualize and interpret user interactions, opening new doors to data-driven decision-making. In the digital analytics landscape, understanding how to query GA4 data in BigQuery has become essential.
Professionals in various roles, from digital analysts to BI experts, need precise skills to extract actionable insights from this data. Mastering these techniques allows you to delve deeper into user behavior and optimize strategies effectively.
This article will guide you through the process of querying GA4 event data in BigQuery. You'll learn how to navigate and extract specific event data, empowering you to make informed decisions.
This article is Part 3 in our series, "[GA4] BigQuery Export."
If you're following along, check out the previous articles in the series:
Integrating GA4 with BigQuery offers a powerful solution for professionals who need to analyze their website data. The combination of GA4’s event-based model, instead of the session-based model known before, and BigQuery’s robust querying capabilities allows for detailed analysis of user interactions.
The benefits of using GA4 with BigQuery:
1. Access to Raw Event Data: Unlike the aggregated data in the standard GA4 interface, BigQuery provides access to detailed, raw event data, allowing for more granular analysis.
2. Scalable Data Processing: BigQuery’s powerful querying engine can handle large datasets quickly, making it possible to run complex queries.
3. Custom Reporting: When storing GA4 data in BigQuery, you can create tailored reports and dashboards in Looker Studio that meet precise business needs, and exceeds what you can get inside GA4 interface.
4. Enhanced Insights: The ability to join GA4 conversions data with data from other platforms (like cost data from facebook or google ads) is a step towards end-to-end full-funnel analytics.
5. Better Marketing Strategies: By analyzing detailed event data, marketing managers can gain valuable insights to refine their strategies and improve campaign performance.
Querying GA4 event data in BigQuery is the essential skill for gaining detailed insights into user behavior and website interactions.
Learning the essential steps, including basic queries, dynamic date range selection, and traffic sources analysis, will be helpful for any data professional or marketing expert.
If you’ve already set up GA4 to export data to BigQuery, your next step might be to build reports to see what events are being collected and analyze those events using SQL queries.
When analyzing GA4 data in BigQuery, it's important to start with the basics, focusing on regular, non-nested fields. Those fields in GA4 data are just data points that do not require additional processing to extract or analyze - just query.
These fields are stored in a flat structure within the dataset, making them simpler to query. Understanding these fields is crucial for performing basic analysis in BigQuery.
Some fields are:
Here is the list of a few repeated fields, they are more complex than the simple once, but it’s still quite easy to query them.
Working with non-nested fields in BigQuery is relatively straightforward.
You can perform simple SQL queries to group, filter, and aggregate data based on these fields, providing quick insights into your website's performance.
These fields are easy to access and manipulate, making them ideal for initial data exploration.
Understanding the frequency of different events on your website is crucial for analyzing user behavior and tracking key interactions.
The Events Count query allows you to see how many times each event has occurred, providing a clear overview of your site's most common user actions.
Example:
The following SQL query counts the occurrences of each event in your GA4 dataset and orders them by frequency:
SELECT
event_name,
COUNT(*) AS event_count
FROM `project_id.your_dataset_id.events_20240804`
GROUP BY event_name
ORDER BY event_count DESC
LIMIT 10
Here:
This query provides a quick snapshot of event frequencies, helping you understand which interactions are most common on your website.
A Traffic Report helps you understand your website's traffic sources and how users are arriving at your site.
You can get valuable insights about the usefulness of different marketing channels and campaigns by scanning traffic sources, mediums, and campaign names.
Example:
This query breaks down traffic by source, medium, and campaign name, giving you a clear view of how users are reaching your site.
SELECT
traffic_source.source,
traffic_source.medium,
traffic_source.name,
COUNT(DISTINCT user_pseudo_id) AS users
FROM `project_id.your_dataset_id.events_20240801`
GROUP BY traffic_source.source, traffic_source.medium, traffic_source.name
ORDER BY users DESC
Here:
This query provides a detailed breakdown of your website's traffic sources, helping you understand which channels are driving the most users.
A Traffic Report by Date allows you to analyze how your website's traffic changes over time.
By breaking down traffic data by date, source, medium, and campaign name, you can track the effectiveness of your marketing efforts on a daily basis.
Example:
Let’s explore an example query that provides a detailed view of your website’s traffic by date, source, medium, and campaign name.
SELECT
event_date AS date,
traffic_source.source,
traffic_source.medium,
traffic_source.name,
COUNT(DISTINCT user_pseudo_id) AS users
FROM `project_id.your_dataset_id.events_202408*`
GROUP BY
date,
traffic_source.source,
traffic_source.medium,
traffic_source.name
ORDER BY
date DESC,
users DESC
LIMIT
10
Here:
This query provides a comprehensive view of your website's traffic over time, helping you identify trends and assess the daily impact of different marketing channels.
If you're interested in exploring more SQL queries for analyzing GA4 data in BigQuery, OWOX's guide offers a comprehensive library of queries. This resource offers detailed examples and explanations to help you enhance your data analysis skills and make the most of your GA4 exports. Check it out here
Leveraging AI-powered tools can significantly enhance your ability to analyze complex datasets and generate crucial insights. An AI-powered SQL assistant can automate query generation, making it easier to work with large datasets like those from GA4 in BigQuery.
OWOX SQL Copilot is an AI-powered tool designed to assist with generating SQL queries for BigQuery. It simplifies the process of querying complex datasets, offering automated suggestions and query templates.
With OWOX SQL Copilot, even those with limited SQL knowledge can efficiently extract insights from GA4 data and create custom reports.
Date ranges in BigQuery can be categorized into two types: dynamic and static. Dynamic date ranges adjust automatically based on the current date, allowing for real-time analysis and continuous monitoring of data trends. This is particularly useful for dashboards and reports that need to reflect the latest data without manual updates.
Static date ranges, on the other hand, are fixed and do not change over time. These are ideal for analyzing specific periods, such as comparing performance across different months or evaluating the impact of a particular marketing campaign.
When working with large datasets in BigQuery, it's important to have flexible methods for selecting date ranges to suit your analytical needs. One efficient way to do this is by using wildcards in your SQL queries.
Wildcards allow you to dynamically select tables based on specific patterns in their names, making it easier to query data from particular time periods without manually listing each table.
Wildcards can be used in the FROM clause to select tables based on their naming conventions, which typically include date components. Here are some examples of how to use wildcards for selecting different date ranges:
All Tables in the Dataset:
FROM `project_id.dataset_id.events_*`
This will select data from all available tables in the dataset, regardless of the date.
Full Year (2023):
FROM `project_id.dataset_id.events_2023*`
This selects data from all tables corresponding to the year 2023.
Specific Month (January 2024):
FROM `project_id.dataset_id.events_202401*`
This selects data from all tables corresponding to January 2024.
First 9 Days of January 2024:
FROM `project_id.dataset_id.events_2024010*`
This selects data from the first nine days of January 2024.
When analyzing data over a specific, unchanging period, static date range queries are the go-to method. Static date range queries are particularly useful for evaluating performance across a set period, such as comparing data from the same month across different years or analyzing the impact of a campaign that ran for a specific duration.
Example Query: Static Date Range – 30 Days
This query retrieves data from a fixed date range, such as the entire month of July 2024.
SELECT
*
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND '20240731'
Here:
Dynamic date range queries, on the other hand, adjust automatically based on the current date. These queries are ideal for ongoing analysis, such as monitoring the last 7 or 30 days of data without needing to manually update the date range.
Example Query: Dynamic Date Range – Last 30 Days
This query dynamically retrieves data from the last 30 days, updating every day to include the most recent data.
SELECT
*
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 day))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
Example Query: Dynamic Date Range – Last 7 Days
For a shorter, more focused analysis, this query retrieves data from the last 7 days.
SELECT
*
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 day))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
Example Query: Mixed Static and Dynamic Date Range
You can also combine static and dynamic dates in your queries to capture data from a fixed starting point up to a dynamically determined end date. This approach is useful when you want to analyze data from a specific start date until the most recent available data.
This query retrieves data from July 1st, 2024, up to yesterday's date:
SELECT
*
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
In GA4 and BigQuery, date and timestamp dimensions are essential for analyzing time-based data.
These dimensions allow you to perform temporal analysis, helping you understand when specific actions occur and how they correlate with other data points over time.
Let’s explore some examples of how to query these default date and timestamp dimensions using GA4 data in BigQuery. These examples will demonstrate how to extract and analyze data based on specific time intervals.
1. Querying by event_date
This query retrieves data based on the event_date, which represents the date on which the event was logged in the format YYYYMMDD.
SELECT
event_date,
COUNT(*) AS event_count
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
GROUP BY
event_date
ORDER BY
event_date DESC
2. Querying by event_timestamp
This query extracts the precise time (in microseconds, UTC) at which an event was logged on the client, using event_timestamp.
SELECT
TIMESTAMP_MICROS(event_timestamp) AS event_time,
COUNT(*) AS event_count
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
GROUP BY
event_time
ORDER BY
event_time DESC
3. Querying by event_previous_timestamp
This query analyzes the difference in time between consecutive events by looking at the event_previous_timestamp.
SELECT
TIMESTAMP_MICROS(event_timestamp) AS current_event_time,
TIMESTAMP_MICROS(event_previous_timestamp) AS previous_event_time,
TIMESTAMP_MICROS(event_timestamp) - TIMESTAMP_MICROS(event_previous_timestamp) AS time_difference
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
ORDER BY
current_event_time DESC
4. Querying by event_server_timestamp_offset
This query examines the offset between the event's collection time and the upload time, using event_server_timestamp_offset.
SELECT
event_date,
event_server_timestamp_offset,
COUNT(*) AS event_count
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
GROUP BY
event_date, event_server_timestamp_offset
ORDER BY
event_server_timestamp_offset DESC
5. Querying by user_first_touch_timestamp
This query retrieves the first time (in microseconds) a user opened the app or visited the site, using user_first_touch_timestamp.
SELECT
user_first_touch_timestamp,
COUNT(DISTINCT user_pseudo_id) AS user_count
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
GROUP BY
user_first_touch_timestamp
ORDER BY
user_first_touch_timestamp DESC
6. Querying by set_timestamp_micros
This query retrieves the timestamp when a specific user property was last set, using set_timestamp_micros.
SELECT
(SELECT value.set_timestamp_micros FROM UNNEST(user_properties) WHERE key = '') AS user_set_timestamp_micros,
COUNT(*) AS event_count
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
GROUP BY
user_set_timestamp_micros
ORDER BY
user_set_timestamp_micros DESC
These examples demonstrate how to use different date and timestamp dimensions to extract detailed insights from your GA4 data in BigQuery.
Each dimension provides a unique perspective on user behavior and event timing, enabling more precise and informed analysis.
When analyzing GA4 data in BigQuery, you might often want to focus on specific events, such as purchase, page_view, or other custom events relevant to your business. This targeted approach allows for a more detailed analysis of user interactions and behaviors related to specific actions on your website or app.
In GA4, the event_name parameter identifies the particular actions that users take on your website or app. These events can be classified into two types: automatic and custom.
Automatic Events: These are built into GA4 and are automatically tracked without any additional setup. (page_view, scroll, click).
Custom Events: These are events that you define based on specific actions that are important to your business (purchase, form_submit). Custom events are implemented through additional setups, such as using Google Tag Manager (GTM) or the dataLayer.
A pageview event in GA4 represents each time a user loads a page or URL on your website. This event is automatically tracked by GA4 and is crucial for understanding user behavior, such as which pages are most visited, how users navigate through your site, and how often they return to specific pages.
Example:
To focus specifically on pageview events, you can use the following query. This will filter your dataset to include only the events where the event_name is page_view.
SELECT
*
FROM `project_id.your_dataset_id.events_*`
WHERE
event_name = 'page_view'
LIMIT
10
Here:
Querying for pageview events is an essential step in understanding how users interact with your website. By isolating these events, you can gain insights into which pages are getting the most traffic, how users navigate through your site.
When analyzing GA4 data in BigQuery, you might want to narrow down your focus to specific attributes like the user's country. This is useful if you’re interested in understanding user behavior from a particular geographic region.
Example:
To filter your data by country, in this case, the United States, you can modify your query to include a WHERE clause that specifies both the event name and the country.
SELECT
*
FROM `project_id.your_dataset_id.events_*`
WHERE
event_name = 'page_view'
AND geo.country = 'United States'
LIMIT
10
Here:
Filtering your GA4 data by attributes such as country allows for a more targeted analysis of user behavior. By focusing on users from specific regions like the United States, you can better understand their interactions with your site, which pages they visit, and how they navigate through your content.
As you become more familiar with querying GA4 data in BigQuery, you'll likely want to dive deeper into the data to uncover more complex insights. Advanced querying techniques enable you to handle nested and repeated fields, calculate custom metrics, and perform detailed segmentation of your data.
In GA4 and BigQuery, nested and repeated fields are used to store more complex data structures. These fields allow for the representation of hierarchical data within a single row of a table, which is essential for capturing detailed user interactions and events.
Nested fields in GA4 represent complex data structures within a single event, allowing multiple related data points to be stored together. However, when you need to export this data to a traditional relational database, the nested structure can become cumbersome because relational databases typically expect flat, normalized tables.
To work with nested and repeated fields in BigQuery, you often need to use the UNNEST() function. This function flattens the nested data, allowing you to query it as if it were in a traditional relational table format.
Below are two examples of queries that utilize nested or repeated fields in GA4 data exported to BigQuery. These examples demonstrate how to handle complex data structures such as event parameters and items associated with e-commerce events.
Example 1: Accessing the ga_session_number from Event Parameters
This query extracts the session number from the event_params nested field and counts the number of page views per session.
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') AS sessionNumber,
COUNTIF(event_name = 'page_view') AS pageviews
FROM
`project_id.your_dataset_id.events_*`
GROUP BY
user_pseudo_id, sessionNumber
ORDER BY
pageviews DESC
Here:
This query is useful for analyzing session-based metrics, allowing you to track how many pages users view during each session.
Example 2: Analyzing Items in E-Commerce Events
This query retrieves the most popular products by quantity sold, filtering only for purchase events.
SELECT
item.item_name,
SUM(item.quantity) AS total_quantity,
SUM(item.price) AS total_revenue
FROM
`project_id.your_dataset_id.events_*`,
UNNEST(items) AS item
WHERE
event_name = 'purchase'
GROUP BY
item.item_name
ORDER BY
total_quantity DESC
Here:
This query is useful for e-commerce analysis, helping you identify which products are the most popular and generate the most revenue.
In GA4 and BigQuery, understanding user behavior is crucial for analyzing website performance. Key metrics such as total users, new users, and active users help you gauge the engagement and reach of your site. These metrics are often used to assess the effectiveness of your marketing efforts, user retention, and overall site growth.
Example:
The following SQL query in BigQuery calculates the total number of users, active users, the percentage of active users, new users, and the percentage of new users from GA4 event data.
COUNT(DISTINCT user_pseudo_id) AS users,
-- active users (the number of engaged users)
COUNT(DISTINCT
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') > 0
OR (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1'
THEN user_pseudo_id
ELSE NULL
END
) AS active_users,
-- % active users (the percentage of engaged users from the total users)
COUNT(DISTINCT
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') > 0
OR (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1'
THEN user_pseudo_id
ELSE NULL
END
) / COUNT(DISTINCT user_pseudo_id) AS percent_active_users,
-- new users (the number of users who interacted with your site for the first time)
COUNT(DISTINCT
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN user_pseudo_id
ELSE NULL
END
) AS new_users,
-- % new users (the percentage of new users from the total users)
COUNT(DISTINCT
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN user_pseudo_id
ELSE NULL
END
) / COUNT(DISTINCT user_pseudo_id) AS percent_new_users
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20210101'
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
LIMIT
10
Here:
This query provides a comprehensive overview of user engagement by calculating key metrics such as total users, active users, new users, and their respective percentages.
Understanding user engagement and bounce rates is crucial for analyzing website performance. Engagement metrics help you determine how users interact with your site, while bounce rates indicate how many users leave your site without interacting.
Example:
Imagine you want to analyze the engagement details and bounce rates for your e-commerce site over a week in August 2024. The following SQL query can help you extract this information:
SELECT
-- Query date
event_date AS date,
-- Query sessions count
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) AS sessions,
-- Query engaged sessions count
COUNT(DISTINCT CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END) AS engaged_sessions,
-- Query engagement rate
SAFE_DIVIDE(
COUNT(DISTINCT CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))
) AS engagement_rate,
-- Query bounced sessions count
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) -
COUNT(DISTINCT CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END) AS bounces,
-- Query bounce rate
SAFE_DIVIDE(
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) -
COUNT(DISTINCT CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))
) AS bounce_rate,
-- Query event count per session
SAFE_DIVIDE(
COUNT(DISTINCT CASE WHEN event_name = 'page_view' THEN CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))
) AS event_count_per_session
FROM `project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240801' AND '20240807'
GROUP BY date
Here:
By focusing on specific dates, you can track how well your site is retaining users and how effectively they are engaging with your content. This kind of analysis helps in identifying trends and making data-driven decisions to improve user experience and reduce bounce rates.
If you're interested in delving deeper into SQL queries for analyzing GA4 data in BigQuery, OWOX's guide offers a comprehensive collection of examples. This resource is filled with detailed queries and explanations, helping you enhance your data analysis skills and maximize the value of your GA4 exports.
OWOX SQL Copilot is an AI-powered assistant designed to help you generate SQL queries for BigQuery. It streamlines the process of querying large datasets by providing automated suggestions and ready-to-use templates. With OWOX SQL Copilot, even users with limited SQL expertise can efficiently extract insights from GA4 data and create custom reports with ease.
Knowing where your website traffic comes from is essential for enhancing your marketing strategies and improving overall site performance. Traffic source analysis in GA4 allows you to identify the origins of your site visitors, whether they arrive through organic search, paid campaigns, social media, or direct traffic.
Traffic sources help you understand where your visitors are coming from, whether it's through organic search, paid campaigns, social media, or direct visits. By querying traffic source details, you can gain insights into the performance of your marketing efforts and optimize strategies for better results.
It’s important to note that traffic source data can be accessed at different levels in GA4:
1. User Attraction Level (traffic_source.source): This level provides information about the initial source that attracted a user to your site. It captures the broader view of how users are initially engaging with your brand.
Example: Querying Traffic Source at the User Level
In this example, we will focus on querying traffic source details at the user level using the traffic_source.source parameter. This will help us understand which platforms or networks are the initial points of entry for users visiting the site.
SELECT
traffic_source.source AS traffic_source
FROM
`project_id.your_dataset_id.events_*`
WHERE
-- Define the date range (July 1, 2024, to the day before the current date)
_table_suffix BETWEEN '20240701' AND format_date('%Y%m%d', date_sub(current_date(), INTERVAL 1 day))
GROUP BY
traffic_source
Here:
2. Session Level: The source (based on session) refers to the specific platform, website, or network that directed a user to your site during a particular session. Traffic sources at the session level are tracked using event_params. This level captures the source of traffic for each individual session.
Example: Querying Session-Based Traffic Source in GA4
In this example, we will query session-based traffic source details in GA4 using BigQuery.
WITH prep AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
ARRAY_AGG((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') IGNORE NULLS ORDER BY event_timestamp)[SAFE_OFFSET(0)] AS source
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240701' AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
GROUP BY
user_pseudo_id,
session_id
)
SELECT
COALESCE(source, '(direct)') AS source_session,
COUNT(DISTINCT CONCAT(user_pseudo_id, session_id)) AS sessions
FROM
prep
GROUP BY
source_session
ORDER BY
sessions DESC
Here:
Grouping and ordering traffic metrics in BigQuery allows you to organize and analyze data based on specific criteria, such as the traffic source, medium, and campaign.
By doing so, you can gain insights into the performance of different channels and understand how users interact with your site based on their point of entry.
Example: Grouping and Ordering Traffic Metrics
In this example, we’ll query traffic source data, grouping by channel and ordering by the number of sessions.
The query will calculate various metrics, including
These metrics will be grouped into channel groupings like Organic Search, Direct, Paid, Social, and Referral, allowing you to see how each channel performs.
SELECT
CASE
WHEN REGEXP_CONTAINS(medium, r'(?mi)(organic)') THEN 'Organic Search'
WHEN REGEXP_CONTAINS(medium, r'(?mi)(none)') THEN 'Direct'
WHEN REGEXP_CONTAINS(medium, r'(?mi)(display)') THEN 'Paid'
WHEN REGEXP_CONTAINS(medium, r'(?mi)(not set)') THEN 'Not set'
WHEN REGEXP_CONTAINS(medium, r'(?mi)(^offline$)') THEN 'Offline'
WHEN REGEXP_CONTAINS(medium, r'(?mi)(cpc)|(ppc)|(cpm)|(adv)|(_ad)|(^ad)|(paid)|(remark)|(target)') THEN 'Paid'
WHEN REGEXP_CONTAINS(medium, r'(?mi)(insta)|(twitter)|(telegram)|(social)|(smm)|(facebook)|(youtube)') THEN 'Social'
ELSE
'Referral'
END AS channelGrouping,
*,
pageviews/sessions AS page_per_session,
pageviews/users AS page_per_user
FROM (
SELECT
event_date,
traffic_source.source AS source,
traffic_source.medium AS medium,
SUM(IF(event_name = 'session_start', 1, NULL)) AS sessions,
SUM(IF(event_name = 'first_visit', 1, NULL)) AS first_sessions,
COUNT(DISTINCT user_pseudo_id) AS users,
SUM(IF(event_name = 'page_view', 1, NULL)) AS pageviews,
SUM(IF(event_name = 'purchase', 1, NULL)) AS purchases,
SUM(ecommerce.purchase_revenue) AS revenue,
SAFE_DIVIDE(
COUNT(DISTINCT CONCAT(user_pseudo_id, (
SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'
))) - COUNT(DISTINCT CASE
WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id, (
SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'
))
END),
COUNT(DISTINCT CONCAT(user_pseudo_id, (
SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'
)))
) AS bounce_rate
FROM
`project_id.your_dataset_id.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240801' AND '20240810'
GROUP BY
event_date,
traffic_source.source,
traffic_source.medium
ORDER BY
sessions DESC
)
Here:
This query provides a comprehensive analysis of traffic metrics, grouped by channel and ordered by session count. By categorizing traffic into broader channel groupings and calculating key metrics, this query helps you understand how different channels contribute to your website’s performance.
Querying user count by source, medium, and date allows you to analyze how different traffic channels contribute to user acquisition over time. This type of query is useful for understanding the effectiveness of your marketing efforts, as it shows which sources and mediums are driving the most users to your site on specific dates.
Key Elements of the Query:
By combining these elements in a query, you can track user acquisition trends, identify peak traffic days, and optimize your marketing strategies based on the performance of different channels.
Example: Querying User Count and Engagement Metrics by Source, Medium, and Date
This example demonstrates how to query detailed user engagement metrics in GA4 using BigQuery. The query captures various metrics, all grouped by source, medium, and date.
SELECT
event_date,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
COUNT(DISTINCT user_pseudo_id) AS users,
-- Number of active users (users with engagement)
COUNT(DISTINCT CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') > 0
OR (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1'
THEN user_pseudo_id
ELSE NULL
END) AS active_users,
-- Number of new users
COUNT(DISTINCT CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN user_pseudo_id
ELSE NULL
END) AS new_users,
-- Percentage of new users out of total users
COUNT(DISTINCT CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN user_pseudo_id
ELSE NULL
END) * 1.0 / COUNT(DISTINCT user_pseudo_id) AS percentage_new_users,
-- Count of new sessions by new users
COUNT(DISTINCT CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))
ELSE NULL
END) AS new_sessions,
-- Percentage of new sessions out of total sessions
COUNT(DISTINCT CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))
ELSE NULL
END) * 1.0 / COUNT(DISTINCT CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))) AS percentage_new_sessions,
-- Average number of sessions per user
COUNT(DISTINCT CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))) * 1.0 / COUNT(DISTINCT user_pseudo_id) AS number_of_sessions_per_user,
-- Average number of page view events per user
COUNTIF(event_name = 'page_view') * 1.0 / COUNT(DISTINCT user_pseudo_id) AS event_count_per_user,
-- Average number of engaged sessions per user
COUNT(DISTINCT CASE
WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1'
THEN CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))
END) * 1.0 / COUNT(DISTINCT user_pseudo_id) AS engaged_sessions_per_user
FROM
`project_id.your_dataset_id.events_*`
WHERE
-- Define start and end date
_TABLE_SUFFIX BETWEEN '20240801' AND '20240810'
GROUP BY
event_date, source, medium
Here:
This query is a comprehensive tool for analyzing user engagement metrics by traffic source, medium, and date.
If you're looking to dive deeper into SQL queries to analyze GA4 data in BigQuery, OWOX’s guide provides a rich library of examples. This resource is packed with detailed queries and explanations, making it easier to sharpen your data analysis skills and get the most out of your GA4 exports. You can explore it here.
The OWOX Reports Extension for Google Sheets is an invaluable tool for streamlining report creation and maintenance. It automates report update, ensuring that data remains current without manual intervention.
This enables you to focus more on analyzing insights rather than managing data refreshes, making your reporting more efficient and effective.
By utilizing this extension, you can fully harness the power of your GA4 data within BigQuery. Following the outlined steps in this article, you'll be able to extract meaningful insights, optimize your marketing strategies, and make informed business decisions based on accurate, up-to-date information.
Are you leveraging the robust capabilities of BigQuery to query your Google Analytics 4 event data?
Understanding this integration can significantly enhance how you visualize and interpret user interactions, opening new doors to data-driven decision-making. In the digital analytics landscape, understanding how to query GA4 data in BigQuery has become essential.
GA4 event parameters are additional pieces of information that can be attached to events in Google Analytics 4 (GA4) to provide more context about user interactions. These parameters can be predefined (such as page_title or session_id) or custom (defined by you) to track specific details relevant to your business. They help you gain deeper insights by capturing the specifics of user actions, allowing for more detailed analysis in reports.
To link GA4 to BigQuery, go to your GA4 property, navigate to 'Admin,' select 'BigQuery Linking,' and follow the prompts to link your BigQuery project. Choose your data streams and export frequency, then submit to start exporting GA4 data for advanced analysis.
Yes, by enabling streaming export in your GA4 property, you can analyze real-time data in BigQuery. This setup allows data to flow into BigQuery within minutes of user interactions, providing timely insights.
Create custom reports by writing SQL queries in BigQuery. Use SELECT, UNNEST, and WHERE clauses to filter and analyze GA4 event parameters. Customize your reports by grouping data, calculating metrics, and scheduling queries for automated updates.