All resources

How to Use Common Table Expressions (CTEs) in BigQuery for Efficient Queries

Writing complex SQL queries in BigQuery can quickly become overwhelming, especially when dealing with multiple joins, aggregations, and filtering conditions. Common Table Expressions (CTEs) offer a solution by breaking down queries into structured, reusable components, making them easier to read and maintain. Whether you're analyzing sales trends, managing customer data, or handling hierarchical relationships, CTEs can simplify query logic while enhancing efficiency.

i-radius

But why should you choose CTEs over subqueries or temporary tables? They not only improve query readability but also optimize performance by eliminating redundant calculations. Additionally, recursive CTEs unlock the ability to process hierarchical data, such as organizational charts or category structures. 

What Is a Common Table Expression (CTE)?

A Common Table Expression (CTE) is a temporary named result set in SQL that exists only during query execution. It acts like a virtual table, allowing you to structure complex queries, improve readability, and reuse intermediate results. CTEs help refine and manipulate data before generating the final output.

Why Are CTEs Important?

The importance of CTEs in BigQuery cannot be overstated. They allow you to write clean, structured queries without relying on subqueries or temporary tables. 

Reduce Code Duplication by Reusing Intermediate Results

CTEs help eliminate redundant code by allowing you to define reusable query components. Instead of repeating the same subqueries multiple times, you can store intermediate results in a CTE and reference them throughout your query. This approach not only makes your SQL code more organized but also minimizes errors and enhances maintainability. 

Make Complex SQL Queries Easier to Read and Debug

When dealing with large datasets, SQL queries can become long and difficult to understand. CTEs help break down complex logic into smaller, readable sections, making queries more structured and easier to follow. By assigning meaningful names to CTEs, developers and analysts can quickly identify what each part of the query does. 

Enable Recursive Operations for Hierarchical Data Processing

One of the most powerful features of CTEs is their ability to handle recursive queries. Recursive CTEs are essential for processing hierarchical data, such as organizational charts, category structures, or bill of materials. They allow SQL queries to reference previous rows within the same query, making it possible to traverse parent-child relationships efficiently without complex looping mechanisms.

Simplify Complex Query Logic

Without CTEs, handling multiple joins, aggregations, and filtering conditions can result in messy, difficult-to-maintain queries. CTEs provide a way to segment and simplify these operations by breaking them into logical, sequential steps. This makes it easier to modify parts of the query without affecting the entire SQL statement.

Enhance Query Performance by Breaking Down Operations

By using CTEs, you can optimize query execution by structuring operations more efficiently. Instead of performing multiple complex calculations in a single step, CTEs allow you to preprocess data in stages, reducing computational load. This is particularly useful when dealing with large datasets, as it ensures that queries run faster while consuming fewer resources.

Understanding CTE Syntax and Exploring Examples in BigQuery

CTEs in BigQuery help simplify queries by creating temporary result sets that can be reused within a query. Let’s look at the syntax and examples to see how CTEs work in practice.

Basic Queries

A Basic CTE query allows you to create a temporary result set within a SQL statement using the WITH clause. This result set can be referenced in the main query, making it easier to structure and organize data without needing subqueries. 

Syntax of CTE

The syntax for defining a CTE using the WITH clause is as follows:

1WITH cte_name (column1, column2, ...) AS (
2    SELECT ...
3    FROM ...
4    WHERE ...
5)
6SELECT * FROM cte_name;

Here:

  • cte_name: The name of the CTE, used to reference the result set in the query.
  • (column1, column2, ...): An optional list of column names for the CTE, which helps assign specific names to the result set columns.
  • SELECT ... FROM ... WHERE ...: The query that defines the CTE, including filtering and aggregation logic.

Example of CTE

Suppose we want to find all recent orders placed in the last 30 days, along with the customer’s name and total amount. Instead of writing a complex query, we can define a CTE to filter recent orders first, then use it in the main query.

1WITH recent_orders AS (
2    SELECT OrderID, CustomerName, TotalAmount, OrderDate
3    FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
4    WHERE OrderDate > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
5)
6SELECT * FROM recent_orders;

SQL query using a Common Table Expression (CTE) to filter and retrieve recent orders based on the last 30 days. i-shadow

Here:

  • WITH recent_orders AS: Creates a Common Table Expression (CTE) named recent_orders, which acts as a temporary result set.
  • SELECT OrderID, CustomerName, TotalAmount, OrderDate: Retrieves the Order ID, customer name, total amount, and order date from the Sales_Data table.
  • FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE: Specifies the source table from which the data is selected.
  • WHERE OrderDate > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY): Filters orders placed within the last 30 days by subtracting 30 days from the current date.
  • SELECT * FROM recent_orders;: Fetches all records from the recent_orders CTE, returning the filtered order data.

Multi-CTE Queries

Multi-CTEs allow you to define multiple Common Table Expressions within a single query using a single WITH clause. Instead of writing multiple separate queries, SQL enables combining multiple CTEs in one statement, making complex queries more structured and efficient.

Syntax of Multi CTE

The syntax of Multi CTE Queries is:

1WITH cte1 AS (
2    SELECT column1 FROM dataset.table1
3),
4cte2 AS (
5    SELECT column2 FROM dataset.table2
6)
7SELECT * FROM cte1 JOIN cte2 ON cte1.column1 = cte2.column2;


Here:

  • WITH cte1 AS (...): Defines the first CTE, which retrieves specific data from the first dataset.
  • , (Comma): Separates multiple CTEs within a single WITH clause.
  • cte2 AS (...): Defines the second CTE, which retrieves different data from another dataset.
  • SELECT * FROM cte1 JOIN cte2 ON ...: Combines the results of both CTEs in the main query using a JOIN.

Multiple CTEs can be defined and used within the same query.

Example of Multi CTE

Let's assume we need to identify high-value orders (TotalAmount > 800) and track shipped orders separately. Using multiple CTEs, we first filter each category and then combine the results with a JOIN.

1WITH high_value_orders AS (
2    SELECT OrderID, CustomerName, TotalAmount
3    FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
4    WHERE TotalAmount > 800
5),
6
7shipped_orders AS (
8    SELECT OrderID, OrderStatus
9    FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
10    WHERE OrderStatus = 'Shipped'
11)
12
13SELECT h.OrderID, h.CustomerName, h.TotalAmount, s.OrderStatus
14FROM high_value_orders h
15JOIN shipped_orders s ON h.OrderID = s.OrderID;

SQL query using multiple Common Table Expressions (CTEs)  to retrieve high-value orders and shipped orders then combining them using a JOIN. i-shadow

Here:

  • WITH high_value_orders AS: Creates a CTE named high_value_orders, which filters orders where TotalAmount > 800.
  • WITH shipped_orders AS: Defines a second CTE named shipped_orders, which selects only shipped orders.
  • JOIN Operation: Combines both CTEs by matching OrderID, showing high-value orders that have been shipped.
  • SELECT h.OrderID, h.CustomerName, h.TotalAmount, s.OrderStatus: Retrieves relevant fields from both CTEs.

Recursive Queries

Recursive CTEs allow queries to reference themselves, making them useful for processing hierarchical or iterative data structures. Unlike non-recursive CTEs that run once, recursive CTEs execute continuously until no new results are found. 

Syntax of Recursive Queries

The syntax of Recursive Queries is:

1WITH RECURSIVE cte_name AS (
2    SELECT base_case_column FROM dataset.table WHERE condition
3    UNION ALL
4    SELECT recursive_column FROM cte_name WHERE condition
5)
6SELECT * FROM cte_name;

Here:

  •  WITH RECURSIVE cte_name AS (...): Defines the recursive CTE with a base case and recursive logic.
  • SELECT base_case_column FROM dataset.table WHERE condition: The base case retrieves initial records (starting point).
  • UNION ALL: Combines the base case results with recursive iterations.
  • SELECT recursive_column FROM cte_name WHERE condition: The recursive step retrieves new rows based on previous iterations.
  • SELECT * FROM cte_name: Runs the query until no new rows are found.

Example of Recursive Queries

We want to analyze sequential order processing where each order is processed in sequence. Using a recursive CTE, we track the order sequence, starting with the first completed order and linking it to subsequent orders.

1WITH RECURSIVE order_sequence AS (
2    -- Base case: Select the first completed order
3    SELECT OrderID, CustomerName, OrderStatus, 1 AS level
4    FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
5    WHERE OrderStatus = 'Completed'
6    AND OrderID = (SELECT MIN(OrderID) FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE WHERE OrderStatus = 'Completed')
7
8    UNION ALL
9
10    -- Recursive step: Select the next order in sequence
11    SELECT s.OrderID, s.CustomerName, s.OrderStatus, h.level + 1
12    FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE s
13    JOIN order_sequence h ON s.OrderID = h.OrderID + 1
14    WHERE s.OrderStatus = 'Completed'
15)
16SELECT * FROM order_sequence;

SQL query using a recursive Common Table Expression (CTE) to track sequential order, linking completed orders in order sequence. i-shadow

Here:

  • WITH RECURSIVE order_sequence AS: Defines a recursive CTE named order_sequence.
  • Base Case (SELECT OrderID, CustomerName, OrderStatus, 1 AS level): Retrieves the first completed order by selecting the minimum OrderID where OrderStatus = 'Completed'.
  • UNION ALL: Combines base case results with subsequent iterations.
  • Recursive Step (SELECT s.OrderID, s.CustomerName, s.OrderStatus, h.level + 1): Retrieves the next completed order by finding the order where OrderID = h.OrderID + 1.
  • JOIN order_sequence h ON s.OrderID = h.OrderID + 1: Ensures that each order follows the previous one in sequence.
  • SELECT * FROM order_sequence;: Runs the recursive query until no more completed orders are found.

Nested CTEs

A Nested CTE is a Common Table Expression (CTE) that contains another CTE inside it. This allows SQL queries to layer multiple CTEs, making complex data transformations more structured. Instead of writing multiple separate CTEs, nesting helps organize queries logically, ensuring efficient data processing in BigQuery.

Syntax of Nested CTEs

The syntax of nested queries is: 

1WITH outer_cte AS (
2    WITH inner_cte AS (
3        SELECT column1 FROM dataset.table WHERE condition
4    )
5    SELECT column1 FROM inner_cte WHERE another_condition
6)
7SELECT * FROM outer_cte;

Here:

  • WITH outer_cte AS (...): Defines the outer CTE, which processes data from an inner CTE.
  • WITH inner_cte AS (...): Creates the inner CTE, which is used inside the outer_cte.
  • SELECT column1 FROM dataset.table WHERE condition: Retrieves initial data inside the inner_cte.
  • SELECT column1 FROM inner_cte WHERE another_condition: Filters and processes data further in outer_cte.
  • SELECT * FROM outer_cte;: Returns the final transformed dataset.

Example of Nested CTEs

Imagine we need to analyze total revenue per product and then filter products where the total revenue exceeds $500 using a Nested CTE. The first CTE calculates the total revenue for each product, while the second CTE filters out products that do not meet the revenue threshold.

1WITH high_revenue_products AS (
2    WITH total_revenue AS (
3        SELECT ProductID, ProductName, SUM(TotalAmount) AS total_sales
4        FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
5        GROUP BY ProductID, ProductName
6    )
7    SELECT ProductID, ProductName, total_sales 
8    FROM total_revenue 
9    WHERE total_sales > 500
10)
11SELECT * FROM high_revenue_products;

SQL query using a Nested Common Table Expression (CTE) to calculate total product sales and filter high-revenue products. i-shadow

Here:

  • WITH high_revenue_products AS (...): Defines the outer CTE, which filters high-revenue products.
  • WITH total_revenue AS (...): Creates the inner CTE, which calculates total sales per product.
  • Inner CTE (SELECT ProductID, ProductName, SUM(TotalAmount) AS total_sales): Aggregates TotalAmount for each ProductID and ProductName.
  • Outer CTE (SELECT ProductID, ProductName, total_sales FROM total_revenue WHERE total_sales > 500): Filters only products where total sales exceed $500.
  • Final Query (SELECT * FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE;): Retrieves the final dataset of high-revenue products.

CTEs vs Temporary Tables: Key Differences

When working with temporary data in SQL, you have different options, including Common Table Expressions (CTEs) and Temporary Tables. Each has its own advantages depending on the use case.

Performance and Query Efficiency

CTEs improve query readability by breaking down complex logic into smaller parts, but they disappear after query execution. Temporary Tables, stored in tempdb, remain available for the session, making them more efficient for handling large datasets with indexing. 

Data Persistence and Scope

CTEs exist only within the query that defines them and cannot be referenced afterward. This makes them useful for one-time transformations where data doesn’t need to be stored. Temporary Tables persist until the session ends or they are explicitly dropped, allowing reusability across multiple queries. 

Resource Utilization and Storage Considerations

CTEs do not use disk or memory storage as they are dynamically created and exist only while the query runs. They are ideal for lightweight transformations without repeated access. Temporary Tables are stored in tempdb, consuming disk space and memory, which can impact performance if overused. 

Optimal Use Cases for CTEs and Temporary Tables

CTEs are ideal for enhancing code clarity, making queries more readable and structured. They work best for small, single-use datasets that don’t require storage beyond the query execution. Temporary Tables, however, are more suitable for large datasets or scenarios where results need to be accessed multiple times across different queries within a session.

Differences Between CTEs and Temporary Tables at a glance:

Feature CTE (Common Table Expression) Temporary Table
Definition A temporary result set created within a query using the WITH clause. A physical table stored in tempdb, used for temporary data storage.
Scope Exists only within the query where it is defined. Can persist for the entire session or be accessible globally (##TempTable).
Persistency Disappears after query execution and cannot be reused later. Remains until explicitly dropped or the session ends.
Performance Efficient for recursive queries and breaking down complex logic. Performs better for large datasets that require indexing and multiple transformations.
Features Enhances query readability, reusable within a query. Supports indexes, constraints, and statistics, improving performance.
Best Use Cases Ideal for recursive queries, breaking down complex SQL logic. Suitable for handling large datasets across multiple queries.
Limitations Cannot be used outside the defining query. Can affect performance if used excessively with large datasets.

Common Issues and Solutions When Using CTEs

While CTEs improve query structure and readability, they can sometimes lead to performance issues, recursion errors, or resource limitations. Below are common issues and their solutions.

Recursive CTEs and Iteration Limits

⚠️ Error: Recursive CTEs may run indefinitely if they lack a proper termination condition. In BigQuery, recursive CTEs have a default iteration limit of 500. If a query exceeds this limit, it is aborted, resulting in an error. This happens when recursion continues without meeting a stopping condition.

Solution: Ensure your recursive CTE includes a clear termination condition to prevent infinite loops. You can also limit the recursion depth using a WHERE clause to stop execution at a certain level. If recursion is inefficient, consider using a non-recursive approach, such as JOINs or window functions, for better performance.

Debugging Multi-CTE Queries

⚠️ Error: Multi-CTE queries can produce unexpected results due to data mismatches, filtering issues, or missing records. For example, if you expect 1000 users in a report but only see 980, some records may have been lost during processing.

Solution: Debug step-by-step by extracting subsets of data instead of running the full query. Use filters like specific IDs or date ranges to track missing data efficiently. In BigQuery, queries cost money, so limit the data scan when testing by working with a small subset of records.

Handling Resource Limitations

⚠️ Error: Complex multi-CTE queries can consume too many system resources, leading to quota limits, slow execution, or query failures in BigQuery. Large datasets increase memory usage, and excessive joins or nested CTEs can impact performance.

Solution: Optimize query execution by reducing the number of nested CTEs, filtering data early in the query, and aggregating before joins. If a CTE is used multiple times, consider storing intermediate results in a Temporary Table to avoid redundant computations. Monitor query execution plans to identify bottlenecks and optimize accordingly.

Effective Strategies for Using CTEs in BigQuery

Using CTEs effectively can improve query readability, simplify complex logic, and optimize performance. The following strategies will help ensure your CTEs are both efficient and easy to maintain in BigQuery.

Enhance Code Clarity with Descriptive CTE Names

Using clear and meaningful CTE names makes queries easier to read and maintain. Instead of generic names like cte1, use descriptive names that reflect the data’s purpose, such as high_value_orders or customer_revenue. This helps teams quickly understand query logic without needing to analyze every line of SQL.

Simplify Queries by Limiting CTE Nesting

Excessive nesting of CTEs can make queries harder to debug and optimize. Instead of deeply nested CTEs, break queries into logical steps or consider using Temporary Tables for storing intermediate results. Keeping queries concise and structured improves performance and readability.

Optimize Data Processing by Filtering and Aggregating Early

Applying filters and aggregations at the beginning of a query minimizes the amount of data processed in later steps. This reduces memory usage and improves performance, especially for large datasets. For example, filtering orders before joining tables ensures only relevant data is used, preventing unnecessary computation on unneeded rows.

Optimize CTEs Through Regular Performance Review

Regularly analyze query performance to identify bottlenecks and optimize execution plans. Check for unused columns, unnecessary joins, or excessive recursion that could slow down queries. Using EXPLAIN plans in BigQuery helps detect inefficiencies, allowing refinements to CTEs for faster execution and reduced resource consumption.

Leverage Advanced Features with the OWOX BI Reports Extension for BigQuery

Managing and analyzing data in BigQuery can become complex, especially when dealing with large datasets and multi-step queries. The OWOX BI Reports Extension simplifies this process by providing an intuitive interface for query execution, automation, and visualization.

With features like automated data extraction, scheduled reports, and integration with Google Sheets, the extension enhances workflow efficiency. Users can quickly retrieve insights, monitor key metrics, and optimize queries, ensuring faster decision-making and improved data accessibility within their BigQuery environment.

FAQ

How do I use CTEs in BigQuery?
What are the benefits of using CTEs in BigQuery?
Can CTEs improve the performance of my BigQuery queries?
What is the difference between recursive and non-recursive CTEs in BigQuery?
Can I use multiple CTEs in a single BigQuery query?
What are the best practices for using CTEs in BigQuery?

You might also like

2,000 companies rely on us

Oops! Something went wrong while submitting the form...