If you haven't set up your GA4 BigQuery Export yet, on the right side, you can find a link to our detailed guide on how you can set it all up in minutes.
However, you don’t need to have your own GA4 export data available in BigQuery to practice querying GA4 data. Google has created the BigQuery Public Dataset for GA4 export data so you can play with it.
This ga4_obfuscated_sample_ecommerce dataset contains a sample GA4 event export data from the Google Merchandise Store. The data covers three months from November 2020 to January 2021 and includes information on user behavior, product views, and transactions.
To access this dataset, you will need to have a Google Cloud project with the BigQuery API enabled.
Before we start querying GA4 data in BigQuery let's figure out how to work with dates. Every day of Google Analytics 4 data in BigQuery is stored in a separate table.
If you only need data from one day (let's say December 31st, 2023) the FROM clause in your query will look something like this:
1FROM
2`project_id.dataset_id.events_20231231`
However, in most cases you will need to query a slightly longer period of time. You have one of 2 options here: WILDCARD or _TABLE_SUFFIX.
When specifying the tables you want to retrieve the data FROM, you can use WILDCARD to select the date range.
1FROM
2`project_id.dataset_id.events_*` -- All tables in the dataset
1FROM
2`project_id.dataset_id.events_2021*` -- 2021 full year
1FROM
2`project_id.dataset_id.events_202101*` -- January 2021
1FROM
2`project_id.dataset_id.events_2021010*` -- First 9 days of January 2021
Right after specifying the tables you want to retrieve the data FROM, you can use the WHERE clause to specify the date range.
1SELECT
2*
3FROM
4`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
5WHERE
6_TABLE_SUFFIX BETWEEN `20210101` AND `20210131`
1WHERE
2_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 day))
3AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
1WHERE
2_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 day))
3AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
You can also use the WHERE statement to mix both dynamic and static dates, for example you can query the data from a specific date till yesterday (In our example - From January 1st, 2021 till Yesterday)
1WHERE
2_TABLE_SUFFIX BETWEEN '20210101' AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
1SELECT
2 event_name,
3 COUNT(*) AS event_count
4FROM
5 --select your project, dataset and the table, as well as the date range
6 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
7GROUP BY
8 event_name
9ORDER BY
10 event_count DESC
11LIMIT
12 100
1SELECT
2 Traffic_source.source,
3 Traffic_source.medium,
4 Traffic_source.name,
5 COUNT(DISTINCT user_pseudo_id) AS Users
6FROM
7 --select your project, dataset and the table, as well as the date range
8 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
9GROUP BY
10 Traffic_source.source,
11 Traffic_source.medium,
12 Traffic_source.name
13ORDER BY
14 Users DESC
15LIMIT
16 100
1SELECT
2 Event_date AS date,
3 Traffic_source.source,
4 Traffic_source.medium,
5 Traffic_source.name,
6 COUNT(DISTINCT user_pseudo_id) AS Users
7FROM
8 --select your project, dataset and the table, as well as the date range
9 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
10GROUP BY
11 date,
12 Traffic_source.source,
13 Traffic_source.medium,
14 Traffic_source.name
15ORDER BY
16 date DESC,
17 Users DESC
18LIMIT
19 1000
If you want to query just the specific events, like purchase or page_view, you can use the WHERE clause to filter just those event names you need.
1SELECT
2 *
3FROM
4 --select your project, dataset and the table, as well as the date range
5 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
6WHERE
7 event_name = "page_view"
8LIMIT
9 100
10
While mastering SQL queries is a powerful skill, visualizing these results can often be just as challenging.
Look, BigQuery is a data warehouse, and it's not designed to visualize data: build pivots, charts, graphs or share data with stakeholders.
That's why we've created a solution to simplify this step.
It's as simple as that:
1. Install this extension;
2. Copy-paste any query from this guide;
3. Run it directly from Google Sheets;
4. Set automated updates to get a fresh report.
For most of the advanced queries below, you won't even have to edit the query to enter your project_id, dataset_id, or write any query to select dates. Simply add a new query in extension, run and you'll see a little window on the right side to enter your variables.
Without ever touching the SQL code, making it exceptionally user-friendly for anyone, especially those not familiar with SQL or just getting started.
👉 By using this BigQuery Reports Extension for Google Sheets, you can instantly create dynamic and interactive reports, visualize BigQuery data, set automated updates, and share actionable data insights with stakeholders—all within the familiar interface of Google Sheets.
Note: for some of the queries, you'll need to manually change the project_id and the dataset_id to the one with your data (however, you can still run the query on public data to see how it goes).
Now it's time to move forward and talk about how you can query default user metrics and dimensions from the [GA4] BigQuery export, like users, active users, session id, user pseudo id, and user properties key and values.
Some of Google Analytics 4 dimensions and metrics can be retrieved straight from the tables without performing any calculations or unnests for specific event parameters. Other metrics and dimension that you might need are a bit harder to retrieve.
Please, keep in mind that GA4 has multiple concepts of users:
For every query that involves users, you have to first figure out which concept of user calculations is applicable to your specific calculations. You will have to calculate them, or extract, which can require some more complex SQL statements that we specified below:
SELECT
1SELECT
2 -- user_id (use if you set this up in GA)
3 user_id,
4 -- user_pseudo_id ( formerly client_id - the pseudonymous id for the user)
5 user_pseudo_id,
6 -- user_first_touch_timestamp (the time in microseconds at which the user first opened your website)
7 TIMESTAMP_MICROS(user_first_touch_timestamp) AS user_first_touch_timestamp,
8 -- user_ltv.revenue (the lifetime value of the user)
9 user_ltv.revenue AS user_ltv_revenue,
10 -- user_ltv.currency (the currency of the user lifetime value)
11 user_ltv.currency AS user_ltv_currency
12FROM
13 --select your project, dataset and the table
14 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
15WHERE
16 -- define the wildcard or static and/or dynamic start and end date
17 _TABLE_SUFFIX BETWEEN '20210101'
18 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)) AND
19 user_ltv.revenue > 0
20LIMIT
21 100
1SELECT
2 -- user type (new visitor or returning visitor)
3 CASE
4 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE event_name = 'session_start' AND KEY = 'ga_session_number') = 1 THEN 'new visitor'
5 WHEN (
6 SELECT
7 value.int_value
8 FROM
9 UNNEST(event_params)
10 WHERE
11 event_name = 'session_start'
12 AND KEY = 'ga_session_number') > 1 THEN 'returning visitor'
13 ELSE
14 NULL
15END
16 AS user_type,
17 COUNT(DISTINCT user_pseudo_id) AS users
18FROM
19 --select your project, dataset and the table
20 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
21WHERE
22 -- define the wildcard or static and/or dynamic start and end date
23 _TABLE_SUFFIX BETWEEN '20210101'
24 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
25GROUP BY
26 user_type
27HAVING
28 user_type IS NOT NULL
29LIMIT
30 100
1SELECT
2--query date
3 event_date,
4--count new visitors
5 SUM(CASE WHEN user_type = 'new visitor' THEN 1 ELSE 0 END) AS new_visitors,
6--count returning visitors
7 SUM(CASE WHEN user_type = 'returning visitor' THEN 1 ELSE 0 END) AS returning_visitors
8FROM (
9 SELECT
10 event_date,
11 user_pseudo_id,
12 CASE
13 WHEN MAX(CASE WHEN event_name = 'session_start' AND params.key = 'ga_session_number' THEN params.value.int_value END) = 1 THEN 'new visitor'
14 WHEN MAX(CASE WHEN event_name = 'session_start' AND params.key = 'ga_session_number' THEN params.value.int_value END) > 1 THEN 'returning visitor'
15 END AS user_type
16 FROM
17--replace with your project_id.dataset_id.events_* or run as it is in OWOX BI BigQuery Reports Extension and change later
18 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`,
19 UNNEST(event_params) AS params
20 GROUP BY
21 event_date,
22 user_pseudo_id
23)
24WHERE
25 user_type IS NOT NULL
26GROUP BY
27 event_date
28ORDER BY
29 event_date
1WITH
2 prep AS (
3 SELECT
4 user_pseudo_id,
5 MAX((
6 SELECT
7 value.int_value
8 FROM
9 UNNEST(event_params)
10 WHERE
11 KEY = 'ga_session_number')) OVER (PARTITION BY user_pseudo_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS max_session_number
12 FROM
13 --select your project, dataset and the table, as well as the date range
14 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
15 WHERE
16 -- define static and/or dynamic start and end date
17 _table_suffix BETWEEN '20210101'
18 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)))
19SELECT
20 -- count of sessions (Each sessions gets it's number, where the first session is specified as 1)
21 max_session_number AS count_of_sessions,
22 COUNT(DISTINCT user_pseudo_id) AS users
23FROM
24 prep
25GROUP BY
26 max_session_number
27ORDER BY
28 Max_session_number DESC
29LIMIT
30 100
Diving into advanced SQL can be daunting. But if you're looking to write complex queries without spending hours troubleshooting or refining your scripts... we've created a ChatGPT App - SQL Copilot for BigQuery.
Just check out the video below and use ChatGPT to write the queries that you need:
This gpt app integrates directly with your BigQuery, allowing you to use natural language to generate and optimize SQL queries.
You can share your table schema, add more context, and get the query that you need in seconds.
You can even perform complex JOINs to query data from different datasets or tables into the same report.
👉 The SQL Copilot for BigQuery by OWOX not only assists in creating accurate SQL queries but also teaches you how to handle complex data effectively.
It's like having an SQL expert sitting right beside you, ready to guide you through the intricacies of data querying.
1SELECT
2 -- users (the total number of users)
3 COUNT(DISTINCT user_pseudo_id) AS users,
4 -- active users (the number of engaged users)
5 COUNT(DISTINCT
6 CASE
7 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'engagement_time_msec') > 0 OR ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'session_engaged') = '1' THEN user_pseudo_id
8 ELSE
9 NULL
10 END
11 ) AS active_users,
12 -- % active users (the percentage of engaged users from the total users)
13 COUNT(DISTINCT
14 CASE
15 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'engagement_time_msec') > 0 OR ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'session_engaged') = '1' THEN user_pseudo_id
16 ELSE
17 NULL
18 END
19 ) / COUNT(DISTINCT user_pseudo_id) AS percent_active_users,
20 -- new users (the number of users who interacted with your site for the first time)
21 COUNT(DISTINCT
22 CASE
23 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_number') = 1 THEN user_pseudo_id
24 ELSE
25 NULL
26 END
27 ) AS new_users,
28 -- % new users (the percentage of new users from the total users)
29 COUNT(DISTINCT
30 CASE
31 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_number') = 1 THEN user_pseudo_id
32 ELSE
33 NULL
34 END
35 ) / COUNT(DISTINCT user_pseudo_id) AS percent_new_users
36FROM
37 --select your project, dataset and the table, as well as the date range
38 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
39WHERE
40 -- define static and/or dynamic start and end date
41 _table_suffix BETWEEN '20210101'
42 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
43LIMIT
44 100
1SELECT
2 -- sessions (the total number of sessions)
3 COUNT(DISTINCT CONCAT(user_pseudo_id,(
4 SELECT
5 value.int_value
6 FROM
7 UNNEST(event_params)
8 WHERE
9 KEY = 'ga_session_id'))) AS sessions,
10 -- new sessions (the number of sessions by users who had never visited your site before)
11 COUNT(DISTINCT
12 CASE
13 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_number') = 1 THEN CONCAT(user_pseudo_id,( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'))
14 ELSE
15 NULL
16 END
17 ) AS new_sessions,
18 -- % new sessions (the percentage of sessions by users who had never visited your site before)
19 COUNT(DISTINCT
20 CASE
21 WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_number') = 1 THEN CONCAT(user_pseudo_id,( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'))
22 ELSE
23 NULL
24 END
25 ) / COUNT(DISTINCT CONCAT(user_pseudo_id,(
26 SELECT
27 value.int_value
28 FROM
29 UNNEST(event_params)
30 WHERE
31 KEY = 'ga_session_id'))) AS percent_new_sessions,
32 -- number of sessions per user (the total number of sessions divided by the total number of users)
33 COUNT(DISTINCT CONCAT(user_pseudo_id,(
34 SELECT
35 value.int_value
36 FROM
37 UNNEST(event_params)
38 WHERE
39 KEY = 'ga_session_id'))) / COUNT(DISTINCT user_pseudo_id) AS number_of_sessions_per_user,
40 -- event count per user (the number of times all of the events were triggered divided by amount of users)
41 COUNT(*) / COUNT(DISTINCT user_pseudo_id) AS event_count_per_user,
42 -- engaged sessions per user
43 COUNT(DISTINCT
44 CASE
45 WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id,( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'))
46 END
47 ) / COUNT(DISTINCT user_pseudo_id) AS engaged_sessions_per_user
48FROM
49 --select your project, dataset and the table, as well as the date range
50 `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
51WHERE
52 -- define static and/or dynamic start and end date
53 _table_suffix BETWEEN '20210101'
54 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
55LIMIT
56 100
1select
2 --query date
3 event_date as date,
4 --query sessions count
5 count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'))) as sessions,
6 --query engaged sessions count
7 count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end) as engaged_sessions,
8 --query engagement rate
9 safe_divide(count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end),count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')))) as engagement_rate,
10 --query bounced sessions count
11 count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'))) - count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end) as bounces,
12 --query bounce rate
13 safe_divide(count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'))) - count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end),count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')))) as bounce_rate,
14 -- query event count per session
15 safe_divide(count(distinct case when event_name = 'page_view' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end),count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')))) as event_count_per_session
16 from
17 --replace with your `project_id.dataset_id.events_*` or run as it is in OWOX BI BigQuery Reports Extension and change later
18 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
19where
20 -- define start and end date
21 _TABLE_SUFFIX BETWEEN
22 --replace with start date eg '20240301'
23 '{start_date default="20210101" type="input"}' AND
24 --replace with end date eg '202403131'
25 '{end_date default="20210110" type="input"}'
26 GROUP BY date
27 ORDER BY date DESC
28LIMIT 100
1SELECT
2 -- date (dimension)
3 event_date,
4 -- source (dimension | extracted from event_params)
5 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
6 -- medium (dimension | extracted from event_params)
7 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
8 -- users (metric | the total number of users)
9 COUNT(DISTINCT user_pseudo_id) AS users,
10 -- active users (metric | users with engagement time or session engaged)
11 COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') > 0 OR (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN user_pseudo_id ELSE NULL END) AS active_users,
12 -- new users (metric | users with ga_session_number = 1)
13 COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1 THEN user_pseudo_id ELSE NULL END) AS new_users,
14 -- % new users (metric | percentage of new users out of total users)
15 COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1 THEN user_pseudo_id ELSE NULL END) * 1.0 / COUNT(DISTINCT user_pseudo_id) AS percentage_new_users,
16 -- new sessions (metric | count of new sessions by new users)
17 COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1 THEN CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING)) ELSE NULL END) AS new_sessions,
18 -- % new sessions (metric | percentage of new sessions out of total sessions)
19 COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1 THEN CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING)) ELSE NULL END) * 1.0 / COUNT(DISTINCT CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))) AS percentage_new_sessions,
20 -- sessions per user (metric | average number of sessions per user)
21 COUNT(DISTINCT CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))) * 1.0 / COUNT(DISTINCT user_pseudo_id) AS number_of_sessions_per_user,
22 -- event count per user (metric | average number of page_view events per user)
23 COUNTIF(event_name = 'page_view') * 1.0 / COUNT(DISTINCT user_pseudo_id) AS event_count_per_user,
24 -- engaged sessions per user (metric | average number of engaged sessions per user)
25 COUNT(DISTINCT CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING)) END) * 1.0 / COUNT(DISTINCT user_pseudo_id) AS engaged_sessions_per_user
26FROM
27 --replace with your project_id.dataset_id.events_* or run as it is in OWOX BI BigQuery Reports Extension and change later
28 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
29WHERE
30 -- define start and end date
31 _TABLE_SUFFIX BETWEEN
32--replace with start date eg '20240301'
33'{start_date default="20210101" type="input"}' AND
34--replace with end date eg '202403131'
35'{end_date default="20210110" type="input"}'
36GROUP BY
37 event_date, source, medium
38ORDER BY
39event_date DESC,
40users DESC
1SELECT
2 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
3 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
4 CONCAT((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source'), ' / ', (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium')) AS source_medium,
5 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page,
6 COUNT(DISTINCT user_pseudo_id) AS new_users
7FROM
8 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
9WHERE
10 event_name = 'session_start' AND
11 (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1 AND
12 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
13GROUP BY
14 source,
15 medium,
16 source_medium,
17 landing_page
18ORDER BY
19 new_users DESC
1SELECT
2 event_date,
3 page_location,
4 page_title,
5 COUNTIF(event_name = 'page_view') AS page_views,
6 COUNT(DISTINCT user_pseudo_id) AS users,
7 SAFE_DIVIDE(COUNTIF(event_name = 'page_view'), COUNT(DISTINCT user_pseudo_id)) AS pageviews_per_user,
8 SUM(IF(event_name = 'user_engagement', engagement_time, 0)) AS total_engagement_time,
9 AVG(IF(event_name = 'user_engagement', engagement_time, NULL)) AS avg_engagement_time
10-- For Unique Pageviews and Engagements per User, consider calculating these based on distinct user interactions within subqueries if necessary
11FROM (
12 SELECT
13 event_date,
14 user_pseudo_id,
15 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
16 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title,
17 event_name,
18 IF(event_name = 'user_engagement', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') / 1000, 0) AS engagement_time
19 FROM
20--replace next line with your project_id.dataset_id if running in the BigQuery interface
21 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
22 WHERE
23--replace next line with your dates if running in the BigQuery interface
24 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
25 AND (event_name = 'page_view' OR event_name = 'user_engagement')
26)
27GROUP BY
28 event_date, page_location, page_title
29ORDER BY
30 event_date, page_views DESC
31LIMIT 100
1SELECT
2 event_timestamp,
3 event_date,
4 event_name,
5 user_pseudo_id,
6 (
7 SELECT value.string_value
8 FROM UNNEST(event_params)
9 WHERE key = 'page_location'
10 ) AS landing_page,
11 (
12 SELECT COALESCE(value.int_value, value.float_value, value.double_value)
13 FROM UNNEST(event_params)
14 WHERE key = 'value'
15 ) AS event_value,
16 (
17 SELECT value.string_value
18 FROM UNNEST(event_params)
19 WHERE key = 'transaction_id'
20 ) AS transaction_id,
21 (
22 SELECT value.string_value
23 FROM UNNEST(event_params)
24 WHERE key = 'currency'
25 ) AS currency,
26 -- Assuming item details are directly under event_params for simplicity
27 (
28 SELECT value.string_value
29 FROM UNNEST(event_params)
30 WHERE key = 'item_name'
31 ) AS item_name,
32 (
33 SELECT value.string_value
34 FROM UNNEST(event_params)
35 WHERE key = 'item_id'
36 ) AS item_id,
37 (
38 SELECT value.string_value
39 FROM UNNEST(event_params)
40 WHERE key = 'item_brand'
41 ) AS item_brand,
42 (
43 SELECT value.string_value
44 FROM UNNEST(event_params)
45 WHERE key = 'item_category'
46 ) AS item_category,
47 (
48 SELECT value.string_value
49 FROM UNNEST(event_params)
50 WHERE key = 'item_variant'
51 ) AS item_variant,
52 (
53 SELECT value.string_value
54 FROM UNNEST(event_params)
55 WHERE key = 'coupon'
56 ) AS coupon,
57 (
58 SELECT COALESCE(value.float_value, value.double_value)
59 FROM UNNEST(event_params)
60 WHERE key = 'shipping'
61 ) AS shipping,
62 (
63 SELECT value.string_value
64 FROM UNNEST(event_params)
65 WHERE key = 'payment_type'
66 ) AS payment_type,
67 (
68 SELECT value.string_value
69 FROM UNNEST(event_params)
70 WHERE key = 'page_title'
71 ) AS page_title,
72 COUNT(*) AS number_of_events
73FROM
74 --replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
75 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
76WHERE
77 --replace with conversion even name
78 event_name = '{event_name default="purchase" type="input"}'
79 AND
80 -- replace w/ your dates if running outside of BigQuery Reports Extension
81 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
82GROUP BY
83 event_timestamp, event_date, event_name, user_pseudo_id, landing_page, event_value, transaction_id, currency, item_name, item_id, item_brand, item_category, item_variant, coupon, shipping, payment_type, page_title
84ORDER BY
85 event_date, event_timestamp
1-- Query to retrieve a comprehensive user acquisition report from GA4 data in BigQuery
2WITH base_data AS (
3 SELECT
4 -- The dimensions
5 event_date,
6 event_name,
7 user_pseudo_id,
8 traffic_source.source AS source,
9 traffic_source.medium AS medium,
10 -- Combining source and medium into a single dimension
11 CONCAT(traffic_source.source, '/', traffic_source.medium) AS source_medium,
12 traffic_source.name AS campaign,
13 -- Extracting various user parameters for the metrics calculation
14 (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged,
15 (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
16 (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') AS ga_session_number,
17 (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
18FROM
19 --replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
20 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
21WHERE
22 -- replace w/ your dates if running outside of BigQuery Reports Extension
23 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'),
24-- Calculation of metrics for each user
25metrics AS (
26 SELECT
27 event_date,
28 event_name,
29 user_pseudo_id,
30 source,
31 medium,
32 source_medium,
33 campaign,
34 session_engaged,
35 engagement_time_msec,
36 ga_session_number,
37 ga_session_id,
38
39 -- Calculating engagement time in seconds for clarity
40 SAFE_DIVIDE(engagement_time_msec, 1000) AS engagement_time_sec,
41
42 -- Determining if the session is engaged based on session_engaged value or engagement_time_msec
43 (session_engaged = '1' OR engagement_time_msec > 0) AS is_engaged,
44
45 -- Determining if the session is the first session
46 ga_session_number = 1 AS is_first_session,
47
48 -- Creating a unique session identifier
49 CONCAT(user_pseudo_id, CAST(ga_session_id AS STRING)) AS session_id
50 FROM
51 base_data
52)
53-- Aggregating & Selecting metrics across all users
54 SELECT
55 event_date,
56 source,
57 medium,
58 source_medium,
59 campaign,
60
61 -- Counting total and active users
62 COUNT(DISTINCT user_pseudo_id) AS total_users,
63 COUNT(DISTINCT CASE WHEN is_engaged THEN user_pseudo_id END) AS active_users,
64
65 -- Calculating the percentage of active users
66 SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN is_engaged THEN user_pseudo_id END), COUNT(DISTINCT user_pseudo_id)) AS percent_active_users,
67
68 -- Counting new users and calculating the percentage of new users
69 COUNT(DISTINCT CASE WHEN is_first_session THEN user_pseudo_id END) AS new_users,
70 SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN is_first_session THEN user_pseudo_id END), COUNT(DISTINCT user_pseudo_id)) AS percent_new_users,
71
72 -- Counting total sessions, new sessions and calculating percentages
73 COUNT(DISTINCT session_id) AS sessions,
74 COUNT(DISTINCT CASE WHEN is_first_session THEN session_id END) AS new_sessions,
75 SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN is_first_session THEN session_id END), COUNT(DISTINCT session_id)) AS percent_new_sessions,
76
77 -- Calculating sessions per user and engaged sessions per user
78 SAFE_DIVIDE(COUNT(DISTINCT session_id), COUNT(DISTINCT user_pseudo_id)) AS sessions_per_user,
79 SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN is_engaged THEN session_id END), COUNT(DISTINCT user_pseudo_id)) AS engaged_sessions_per_user,
80
81 -- Counting engaged sessions and calculating the engagement rate
82 COUNT(DISTINCT CASE WHEN is_engaged THEN session_id END) AS engaged_sessions,
83 SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN is_engaged THEN session_id END), COUNT(DISTINCT session_id)) AS engagement_rate,
84
85 -- Calculating average engagement time and events per user
86 SAFE_DIVIDE(SUM(CASE WHEN is_engaged THEN engagement_time_sec ELSE 0 END), COUNT(DISTINCT user_pseudo_id)) AS average_engagement_time_sec,
87 SAFE_DIVIDE(COUNT(*), COUNT(DISTINCT user_pseudo_id)) AS events_per_user
88 -- Additional metrics for conversions and user conversion rate would be added here
89 -- For this example, we assume 'purchase' events signify conversions and we sum up purchase revenue
90 -- Placeholder for revenue calculation (requires exact field specification)
91 FROM
92 metrics
93 GROUP BY
94 event_date,
95 source,
96 medium,
97 source_medium,
98 Campaign
99ORDER BY
100event_date DESC,
101total_users DESC
1SELECT
2 event_date AS Date,
3 item_id AS ID,
4 item_name AS Product_Name,
5 item_variant AS Product_Variant,
6 item_brand AS Brand,
7 item_category AS Category,
8 price AS Price,
9 SUM(quantity) AS Quantity,
10 COUNT(DISTINCT user_pseudo_id) AS User_count
11FROM
12 --replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
13 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`,
14 UNNEST(items)
15WHERE
16 -- Replace date range if you’re running outside if BigQuery Reports Extension
17 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
18 AND event_name IN ('purchase')
19GROUP BY
20 1, 2, 3, 4, 5, 6, 7
21ORDER BY
22 Quantity DESC
1SELECT
2 user_pseudo_id AS Client_id,
3 user_id AS User_id,
4 traffic_source.`source` AS Source,
5 traffic_source.medium AS Medium,
6 geo.continent AS Continent,
7 geo.sub_continent AS Sub_Continent,
8 geo.country AS Country,
9 geo.region AS Region,
10 geo.city AS City,
11 device.category AS Device_Category,
12 device.mobile_brand_name AS Mobile_Brand,
13 device.mobile_model_name AS Mobile_Model,
14 device.operating_system AS OS,
15 device.operating_system_version AS OS_version,
16 device.`language` as Language
17FROM
18 --replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
19 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`,
20 UNNEST(items)
21WHERE
22 -- Replace date range.
23 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
24GROUP BY
25 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
26ORDER BY
27 Client_id ASC
1SELECT
2 *,
3 revenue/transactions AS average_order_value
4FROM (
5 SELECT
6 event_date,
7 --use line 8-10 instead of 11-12 if you need data and the source/channel/company of the session, not the user
8 --collected_traffic_source.manual_source as source,
9 --collected_traffic_source.manual_medium as medium,
10 --collected_traffic_source.manual_campaign_name as campaign,
11 traffic_source.source AS source,
12 traffic_source.medium AS medium,
13 COUNT(DISTINCT user_pseudo_id) AS users,
14 COUNT(DISTINCT
15 IF
16 (event_name='purchase',user_pseudo_id,NULL)) AS converted_users,
17 COUNT(DISTINCT ecommerce.transaction_id) AS transactions,
18 SUM(ecommerce.total_item_quantity) AS products,
19 SUM(ecommerce.purchase_revenue) AS revenue
20 FROM
21--replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
22 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
23 WHERE
24-- Replace date range if running outside of BigQuery Reports Extension
25 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
26 GROUP BY
27 event_date,
28 --if you use lines 8-10, then use lines 29-31 instead of 32-33
29 --collected_traffic_source.manual_source,
30 --collected_traffic_source.manual_medium,
31 --collected_traffic_source.manual_campaign_name
32 traffic_source.source,
33 traffic_source.medium
34 ORDER BY
35 transactions DESC)
TrafficSource Report (w/ Channel Grouping Template)Metrics:
Dimensions:
1SELECT
2 CASE
3 WHEN REGEXP_CONTAINS(medium, r'(?mi)(organic)') THEN 'Organic Search'
4 WHEN REGEXP_CONTAINS(medium, r'(?mi)(none)') THEN 'Direct'
5 WHEN REGEXP_CONTAINS(medium, r'(?mi)(display)') THEN 'Paid'
6 WHEN REGEXP_CONTAINS(medium, r'(?mi)(not set)') THEN 'Not set'
7 WHEN REGEXP_CONTAINS(medium, r'(?mi)(^offline$)') THEN 'Offline'
8 WHEN REGEXP_CONTAINS(medium, r'(?mi)(cpc)|(ppc)|(cpm)|(adv)|(_ad)|(^ad)|(paid)|(remark)|(target)') THEN 'Paid'
9 WHEN REGEXP_CONTAINS(medium, r'(?mi)(insta)|(twitter)|(telegram)|(social)|(smm)|(facebook)|(youtube)') THEN 'Social'
10 ELSE
11 'Referral'
12END
13 AS channelGrouping,
14 *,
15 pageviews/sessions AS page_per_session,
16 pageviews/users AS page_per_user
17FROM (
18 SELECT
19 event_date,
20 traffic_source.source AS source,
21 traffic_source.medium AS medium,
22 --use lines 23-25 for session source/medium/campaign instead of user details
23 --collected_traffic_source.manual_source as source,
24 --collected_traffic_source.manual_medium as medium,
25 --collected_traffic_source.manual_campaign_name as campaign,
26 SUM(
27 IF
28 (event_name='session_start',1,NULL)) AS sessions,
29 SUM(
30 IF
31 (event_name='first_visit',1,NULL)) AS first_sessions,
32 count (DISTINCT user_pseudo_id) AS users,
33 SUM(
34 IF
35 (event_name='page_view',1,NULL)) AS pageviews,
36 SUM(
37 IF
38 (event_name='purchase',1,NULL)) AS purchases,
39 SUM(ecommerce.purchase_revenue) AS revenue,
40 --query bounce rate
41 SAFE_DIVIDE(COUNT(DISTINCT CONCAT(user_pseudo_id,(
42 SELECT
43 value.int_value
44 FROM
45 UNNEST(event_params)
46 WHERE
47 KEY = 'ga_session_id'))) - COUNT(DISTINCT
48 CASE
49 WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id,( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'))
50 END
51 ),COUNT(DISTINCT CONCAT(user_pseudo_id,(
52 SELECT
53 value.int_value
54 FROM
55 UNNEST(event_params)
56 WHERE
57 KEY = 'ga_session_id')))) AS bounce_rate,
58 FROM
59--replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
60 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
61 WHERE
62-- Replace date range if running outside of BigQuery Reports Extension
63 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
64 GROUP BY
65 event_date,
66 traffic_source.source,
67 traffic_source.medium
68 --if you use line 23-35, then use lines 69-71 instead of 66-67
69 --collected_traffic_source.manual_source,
70 --collected_traffic_source.manual_medium,
71 --collected_traffic_source.manual_campaign_name
72 ORDER BY
73 sessions DESC )
1SELECT
2 event_date,
3 event_name,
4 count (DISTINCT event_timestamp ) AS events,
5 COUNT(DISTINCT user_pseudo_id) AS users
6FROM
7--replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
8 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
9 WHERE
10-- Replace date range if running outside of BigQuery Reports Extension
11 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
12GROUP BY
13 1,
14 2
15ORDER BY
16 events desc
1SELECT
2 event_date,
3 device.category,
4 device.mobile_brand_name,
5 device.mobile_model_name,
6 device.operating_system,
7 device.operating_system_version,
8 device.web_info.browser,
9 device.web_info.browser_version,
10 count (DISTINCT user_pseudo_id) AS users,
11 SUM(
12 IF
13 (event_name='session_start',1,NULL)) AS sessions,
14 SUM(
15 IF
16 (event_name='page_view',1,NULL)) AS pageviews,
17FROM
18--replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
19 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
20 WHERE
21-- Replace date range if running outside of BigQuery Reports Extension
22 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
23GROUP BY
24 event_date,
25 device.category,
26 device.mobile_brand_name,
27 device.mobile_model_name,
28 device.operating_system,
29 device.operating_system_version,
30 device.web_info.browser,
31 device.web_info.browser_version
32 ORDER BY
33 users DESC
1SELECT
2 event_date,
3 device.category,
4 device.mobile_brand_name,
5 device.mobile_model_name,
6 device.operating_system,
7 device.operating_system_version,
8 device.web_info.browser,
9 device.web_info.browser_version,
10 count (DISTINCT user_pseudo_id) AS users,
11 SUM(
12 IF
13 (event_name='session_start',1,NULL)) AS sessions,
14 SUM(
15 IF
16 (event_name='page_view',1,NULL)) AS pageviews,
17FROM
18--replace with your project_id.dataset_id if running outside of BigQuery Reports Extension
19 `{gcp_project_id default="bigquery-public-data"}.{dataset_id default="ga4_obfuscated_sample_ecommerce"}.events_*`
20 WHERE
21-- Replace date range if running outside of BigQuery Reports Extension
22 _TABLE_SUFFIX BETWEEN '{start_date default="20210101" type="input"}' AND '{end_date default="20210110" type="input"}'
23GROUP BY
24 event_date,
25 device.category,
26 device.mobile_brand_name,
27 device.mobile_model_name,
28 device.operating_system,
29 device.operating_system_version,
30 device.web_info.browser,
31 device.web_info.browser_version
32 ORDER BY
33 users DESC