Writing complex SQL queries in BigQuery can quickly become overwhelming, especially when dealing with multiple joins, aggregations, and filtering conditions. Common Table Expressions (CTEs) offer a solution by breaking down queries into structured, reusable components, making them easier to read and maintain. Whether you're analyzing sales trends, managing customer data, or handling hierarchical relationships, CTEs can simplify query logic while enhancing efficiency.
But why should you choose CTEs over subqueries or temporary tables? They not only improve query readability but also optimize performance by eliminating redundant calculations. Additionally, recursive CTEs unlock the ability to process hierarchical data, such as organizational charts or category structures.
A Common Table Expression (CTE) is a temporary named result set in SQL that exists only during query execution. It acts like a virtual table, allowing you to structure complex queries, improve readability, and reuse intermediate results. CTEs help refine and manipulate data before generating the final output.
The importance of CTEs in BigQuery cannot be overstated. They allow you to write clean, structured queries without relying on subqueries or temporary tables.
CTEs help eliminate redundant code by allowing you to define reusable query components. Instead of repeating the same subqueries multiple times, you can store intermediate results in a CTE and reference them throughout your query. This approach not only makes your SQL code more organized but also minimizes errors and enhances maintainability.
When dealing with large datasets, SQL queries can become long and difficult to understand. CTEs help break down complex logic into smaller, readable sections, making queries more structured and easier to follow. By assigning meaningful names to CTEs, developers and analysts can quickly identify what each part of the query does.
One of the most powerful features of CTEs is their ability to handle recursive queries. Recursive CTEs are essential for processing hierarchical data, such as organizational charts, category structures, or bill of materials. They allow SQL queries to reference previous rows within the same query, making it possible to traverse parent-child relationships efficiently without complex looping mechanisms.
Without CTEs, handling multiple joins, aggregations, and filtering conditions can result in messy, difficult-to-maintain queries. CTEs provide a way to segment and simplify these operations by breaking them into logical, sequential steps. This makes it easier to modify parts of the query without affecting the entire SQL statement.
By using CTEs, you can optimize query execution by structuring operations more efficiently. Instead of performing multiple complex calculations in a single step, CTEs allow you to preprocess data in stages, reducing computational load. This is particularly useful when dealing with large datasets, as it ensures that queries run faster while consuming fewer resources.
CTEs in BigQuery help simplify queries by creating temporary result sets that can be reused within a query. Let’s look at the syntax and examples to see how CTEs work in practice.
A Basic CTE query allows you to create a temporary result set within a SQL statement using the WITH clause. This result set can be referenced in the main query, making it easier to structure and organize data without needing subqueries.
The syntax for defining a CTE using the WITH clause is as follows:
1WITH cte_name (column1, column2, ...) AS (
2 SELECT ...
3 FROM ...
4 WHERE ...
5)
6SELECT * FROM cte_name;
Here:
Suppose we want to find all recent orders placed in the last 30 days, along with the customer’s name and total amount. Instead of writing a complex query, we can define a CTE to filter recent orders first, then use it in the main query.
1WITH recent_orders AS (
2 SELECT OrderID, CustomerName, TotalAmount, OrderDate
3 FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
4 WHERE OrderDate > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
5)
6SELECT * FROM recent_orders;
Here:
Multi-CTEs allow you to define multiple Common Table Expressions within a single query using a single WITH clause. Instead of writing multiple separate queries, SQL enables combining multiple CTEs in one statement, making complex queries more structured and efficient.
The syntax of Multi CTE Queries is:
1WITH cte1 AS (
2 SELECT column1 FROM dataset.table1
3),
4cte2 AS (
5 SELECT column2 FROM dataset.table2
6)
7SELECT * FROM cte1 JOIN cte2 ON cte1.column1 = cte2.column2;
Here:
Multiple CTEs can be defined and used within the same query.
Let's assume we need to identify high-value orders (TotalAmount > 800) and track shipped orders separately. Using multiple CTEs, we first filter each category and then combine the results with a JOIN.
1WITH high_value_orders AS (
2 SELECT OrderID, CustomerName, TotalAmount
3 FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
4 WHERE TotalAmount > 800
5),
6
7shipped_orders AS (
8 SELECT OrderID, OrderStatus
9 FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
10 WHERE OrderStatus = 'Shipped'
11)
12
13SELECT h.OrderID, h.CustomerName, h.TotalAmount, s.OrderStatus
14FROM high_value_orders h
15JOIN shipped_orders s ON h.OrderID = s.OrderID;
Here:
Recursive CTEs allow queries to reference themselves, making them useful for processing hierarchical or iterative data structures. Unlike non-recursive CTEs that run once, recursive CTEs execute continuously until no new results are found.
The syntax of Recursive Queries is:
1WITH RECURSIVE cte_name AS (
2 SELECT base_case_column FROM dataset.table WHERE condition
3 UNION ALL
4 SELECT recursive_column FROM cte_name WHERE condition
5)
6SELECT * FROM cte_name;
Here:
We want to analyze sequential order processing where each order is processed in sequence. Using a recursive CTE, we track the order sequence, starting with the first completed order and linking it to subsequent orders.
1WITH RECURSIVE order_sequence AS (
2 -- Base case: Select the first completed order
3 SELECT OrderID, CustomerName, OrderStatus, 1 AS level
4 FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
5 WHERE OrderStatus = 'Completed'
6 AND OrderID = (SELECT MIN(OrderID) FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE WHERE OrderStatus = 'Completed')
7
8 UNION ALL
9
10 -- Recursive step: Select the next order in sequence
11 SELECT s.OrderID, s.CustomerName, s.OrderStatus, h.level + 1
12 FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE s
13 JOIN order_sequence h ON s.OrderID = h.OrderID + 1
14 WHERE s.OrderStatus = 'Completed'
15)
16SELECT * FROM order_sequence;
Here:
A Nested CTE is a Common Table Expression (CTE) that contains another CTE inside it. This allows SQL queries to layer multiple CTEs, making complex data transformations more structured. Instead of writing multiple separate CTEs, nesting helps organize queries logically, ensuring efficient data processing in BigQuery.
The syntax of nested queries is:
1WITH outer_cte AS (
2 WITH inner_cte AS (
3 SELECT column1 FROM dataset.table WHERE condition
4 )
5 SELECT column1 FROM inner_cte WHERE another_condition
6)
7SELECT * FROM outer_cte;
Here:
Imagine we need to analyze total revenue per product and then filter products where the total revenue exceeds $500 using a Nested CTE. The first CTE calculates the total revenue for each product, while the second CTE filters out products that do not meet the revenue threshold.
1WITH high_revenue_products AS (
2 WITH total_revenue AS (
3 SELECT ProductID, ProductName, SUM(TotalAmount) AS total_sales
4 FROM owox-d-ikrasovytskyi-001.OWOX_Demo.Sales_Data_CTE
5 GROUP BY ProductID, ProductName
6 )
7 SELECT ProductID, ProductName, total_sales
8 FROM total_revenue
9 WHERE total_sales > 500
10)
11SELECT * FROM high_revenue_products;
Here:
When working with temporary data in SQL, you have different options, including Common Table Expressions (CTEs) and Temporary Tables. Each has its own advantages depending on the use case.
CTEs improve query readability by breaking down complex logic into smaller parts, but they disappear after query execution. Temporary Tables, stored in tempdb, remain available for the session, making them more efficient for handling large datasets with indexing.
CTEs exist only within the query that defines them and cannot be referenced afterward. This makes them useful for one-time transformations where data doesn’t need to be stored. Temporary Tables persist until the session ends or they are explicitly dropped, allowing reusability across multiple queries.
CTEs do not use disk or memory storage as they are dynamically created and exist only while the query runs. They are ideal for lightweight transformations without repeated access. Temporary Tables are stored in tempdb, consuming disk space and memory, which can impact performance if overused.
CTEs are ideal for enhancing code clarity, making queries more readable and structured. They work best for small, single-use datasets that don’t require storage beyond the query execution. Temporary Tables, however, are more suitable for large datasets or scenarios where results need to be accessed multiple times across different queries within a session.
Differences Between CTEs and Temporary Tables at a glance:
While CTEs improve query structure and readability, they can sometimes lead to performance issues, recursion errors, or resource limitations. Below are common issues and their solutions.
⚠️ Error: Recursive CTEs may run indefinitely if they lack a proper termination condition. In BigQuery, recursive CTEs have a default iteration limit of 500. If a query exceeds this limit, it is aborted, resulting in an error. This happens when recursion continues without meeting a stopping condition.
✅ Solution: Ensure your recursive CTE includes a clear termination condition to prevent infinite loops. You can also limit the recursion depth using a WHERE clause to stop execution at a certain level. If recursion is inefficient, consider using a non-recursive approach, such as JOINs or window functions, for better performance.
⚠️ Error: Multi-CTE queries can produce unexpected results due to data mismatches, filtering issues, or missing records. For example, if you expect 1000 users in a report but only see 980, some records may have been lost during processing.
✅ Solution: Debug step-by-step by extracting subsets of data instead of running the full query. Use filters like specific IDs or date ranges to track missing data efficiently. In BigQuery, queries cost money, so limit the data scan when testing by working with a small subset of records.
⚠️ Error: Complex multi-CTE queries can consume too many system resources, leading to quota limits, slow execution, or query failures in BigQuery. Large datasets increase memory usage, and excessive joins or nested CTEs can impact performance.
✅ Solution: Optimize query execution by reducing the number of nested CTEs, filtering data early in the query, and aggregating before joins. If a CTE is used multiple times, consider storing intermediate results in a Temporary Table to avoid redundant computations. Monitor query execution plans to identify bottlenecks and optimize accordingly.
Using CTEs effectively can improve query readability, simplify complex logic, and optimize performance. The following strategies will help ensure your CTEs are both efficient and easy to maintain in BigQuery.
Using clear and meaningful CTE names makes queries easier to read and maintain. Instead of generic names like cte1, use descriptive names that reflect the data’s purpose, such as high_value_orders or customer_revenue. This helps teams quickly understand query logic without needing to analyze every line of SQL.
Excessive nesting of CTEs can make queries harder to debug and optimize. Instead of deeply nested CTEs, break queries into logical steps or consider using Temporary Tables for storing intermediate results. Keeping queries concise and structured improves performance and readability.
Applying filters and aggregations at the beginning of a query minimizes the amount of data processed in later steps. This reduces memory usage and improves performance, especially for large datasets. For example, filtering orders before joining tables ensures only relevant data is used, preventing unnecessary computation on unneeded rows.
Regularly analyze query performance to identify bottlenecks and optimize execution plans. Check for unused columns, unnecessary joins, or excessive recursion that could slow down queries. Using EXPLAIN plans in BigQuery helps detect inefficiencies, allowing refinements to CTEs for faster execution and reduced resource consumption.
Managing and analyzing data in BigQuery can become complex, especially when dealing with large datasets and multi-step queries. The OWOX BI Reports Extension simplifies this process by providing an intuitive interface for query execution, automation, and visualization.
With features like automated data extraction, scheduled reports, and integration with Google Sheets, the extension enhances workflow efficiency. Users can quickly retrieve insights, monitor key metrics, and optimize queries, ensuring faster decision-making and improved data accessibility within their BigQuery environment.
You can use CTEs in BigQuery by defining them with the WITH clause, assigning a name, and referencing it in the main query to simplify complex SQL operations.
CTEs improve query readability, eliminate code duplication, and help structure complex queries. They also enable recursive processing, making hierarchical and iterative data operations more manageable in BigQuery.
CTEs improve query organization and debugging, but performance gains depend on usage. For large datasets, Temporary Tables or optimized indexing may be better for efficiency.
Non-recursive CTEs execute once, while recursive CTEs repeatedly reference themselves until a stopping condition is met. Recursive CTEs are useful for hierarchical or sequential data processing.
Yes, multiple CTEs can be defined within a single WITH clause, separated by commas. They can be referenced independently or combined for more complex query logic.
Use descriptive CTE names, limit unnecessary nesting, filter data early, and optimize performance by reviewing execution plans to avoid redundant processing in complex queries.