Converting data types in BigQuery can sometimes lead to unexpected errors. A single mismatch has the potential to throw your entire query off track. By adopting the right approach, such as using CAST and SAFE_CAST functions, you can ensure seamless and error-free data conversions when working with a mixed dataset.
In this guide, we will explain the differences between CAST and SAFE_CAST, their syntax, and practical use cases. Mastering these functions will enhance your data workflows and optimize query performance, whether for reporting or integration.
The CAST and SAFE_CAST functions in BigQuery are essential tools for converting data types. These functions allow you to transform data formats like strings, dates, and integers, ensuring data consistency and accuracy. By understanding the differences between them, you can optimize queries, prevent errors, and improve data workflows.
The CAST function in BigQuery allows you to convert data from one type to another, ensuring consistency when working with different data structures during operations or comparisons.
However, casting incompatible data structures can lead to runtime errors.
=CAST(expression AS typename [format_clause])
Here:
The following query will return the conversion of a string value to a numeric value.
=CAST("123" AS INT64)
Here:
The SAFE_CAST function in BigQuery safely converts a value from one data type to another, returning NULL instead of raising an error if the conversion fails. This makes it ideal for handling potential conversion errors without disrupting the entire query.
SAFE_CAST is useful when working with uncertain data types, ensuring that your queries remain robust and error-free. It supports all valid data types in BigQuery and provides a safe alternative to regular CAST by gracefully handling conversion failures.
=SAFE_CAST(expression AS typename [format_clause])
Here:
You're handling a dataset with numeric IDs and text errors. Using SAFE_CAST converts values safely, returning NULL for invalid entries like "apple" without breaking the query.
=SELECT SAFE_CAST("apple" AS INT64) AS not_a_number;
Here:
The CAST function in BigQuery makes it easier to work with mixed or inconsistent data formats. Following are a few practical examples of using CAST for data manipulation for analysis.
The CAST function in BigQuery allows you to convert STRING values into TIMESTAMP format.
Example:
Suppose, your dataset includes transaction times in string format. You can convert these strings to TIMESTAMP for time-based analysis, like calculating the time between transactions.
SELECT
name,
CAST(CONCAT(transaction_date, ' ', transaction_time) AS TIMESTAMP) AS transaction_timestamp
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id IN (101, 102, 103);
Here:
The query returns the name and a combined transaction_timestamp (converted from transaction_date and transaction_time) for customers with customer_id 101, 102, and 103.
The CAST function in BigQuery allows you to convert an INTEGER into a FLOAT64 format. This conversion is useful for enabling precise calculations that require decimal values.
Example:
Suppose you need to convert revenue (INT64) to FLOAT64 to calculate averages or use it in math operations requiring decimal numbers. You can use CAST in the following manner.
SELECT
customer_id,
CAST(revenue AS FLOAT64) AS revenue_id_float
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id IN (101, 102, 103);
Here:
Converting INT64 to BOOL is useful when checking whether a number is zero or non-zero. It helps simplify conditions, like flagging values as true (non-zero) or false (zero).
Example:
You want to convert the age column to BOOL for analysis purposes, where any age over 35 returns TRUE (i.e., middle-aged customers).
SELECT name, age,
CAST(age > 35 AS BOOL)
AS is_middle_aged FROM `owox-analytics.dataset.customer_transactions`
WHERE customer_id IN (101, 102, 103);
Here:
The query returns false because Bob's age is 25, less than 35.
When casting from STRING to BOOL in BigQuery, the conversion follows these rules: 'true' -> TRUE, 'false' -> FALSE, and any other value returns NULL.
Example:
You have STRING data in the active_member field and want to convert it to BOOL for boolean comparisons in your queries.
SELECT
name,
SAFE_CAST(active_member AS BOOL) AS active_status
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id IN (101, 102, 103);
Here:
The query returns true because Bob, Alice and Charlie work status are active.
Converting from scientific notation to NUMERIC changes values into exact decimal numbers, improving calculation accuracy.
Example:
Your dataset has a balance field with large amounts in scientific notation, and you need to change them to a regular decimal format for financial reports.
SELECT
balance, CAST(balance AS NUMERIC) AS balance_numeric
FROM
`owox-analytics.dataset.financial_data`
WHERE
account_name = 'Account A';
Here:
The CAST function in BigQuery converts BYTES to STRING. With the FORMAT clause, you can choose the encoding (e.g., ASCII, UTF-8) to make byte sequences readable and easier to work with.
Example:
The raw_data field contains byte-encoded values that need to be converted into STRING for readability.
SELECT
raw_data,
CAST(raw_data AS STRING FORMAT 'HEX') AS decoded_data
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id = 101;
Here:
By specifying a FORMAT clause, you can choose the encoding (e.g., ASCII or UTF-8) for converting the string into its byte representation, making it suitable for storage or further processing.
Example:
You want to store string data as bytes in the raw_data column for more efficient storage or processing.
SELECT
name,
CAST(name AS BYTES FORMAT 'UTF-8') AS name_bytes
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id = 102;
Here:
Most data types in BigQuery can be converted using Conversion Functions like the CAST function but will fail and stop the query if the conversion is invalid.
On the other hand, SAFE_CAST handles failed conversions by returning NULL instead of an error, ensuring the query runs without interruption. This key difference makes SAFE_CAST more reliable when working with data that might not always convert cleanly.
When working with data conversions in BigQuery, choosing between CAST and SAFE_CAST is necessary when dealing with unpredictable or mixed data types.
The expression to be converted must follow a recognizable DATE format, such as 'YYYY-MM-DD', or be a valid TIMESTAMP or DATETIME, where the time component will be discarded. If the expression is a string that doesn't match a valid DATE format, the conversion will fail, potentially causing an error unless SAFE_CAST is used.
Using CAST (Error on Failure):
The CAST function will raise an error if the conversion fails due to an invalid format or incompatible data types.
Example:
In this example, attempting to convert an invalid date format from the transaction_date column will raise an error.
SELECT CAST(transaction_date AS DATE)
AS cast_transaction_date
FROM `owox-analytics.dataset.customer_transactions`;
Explanation:
If transaction_date contains an invalid format like "02/30/2024", the query will fail because CAST expects a valid date in the format YYYY-MM-DD. Invalid data will result in an error, stopping the query.
Using SAFE_CAST (Returns NULL on Failure):
The SAFE_CAST function will return NULL if the conversion fails, allowing the query to proceed without interruption.
Example:
SELECT SAFE_CAST(transaction_date AS DATE)
AS safe_cast_transaction_date
FROM `owox-analytics.dataset.customer_transactions`;
Explanation:
If transaction_date contains invalid data like "02/30/2024", SAFE_CAST will return NULL instead of causing the query to fail. This ensures the query runs smoothly even when there are conversion issues.
In some cases, source systems provide JSON events with timestamps in microsecond precision (e.g., 2024-01-01 14:00:00.123456), which can be successfully cast to a TIMESTAMP in BigQuery using SAFE_CAST.
However, if the source system suddenly sends timestamps in nanosecond precision, the SAFE_CAST function will fail to convert the timestamp properly since BigQuery's TIMESTAMP only supports microsecond precision. This failure won't raise an error but will instead return NULL, potentially going unnoticed without proper monitoring.
Using CAST (Error on Failure):
The CAST function will fail if an incompatible or invalid timestamp format is provided, but it may still convert valid microsecond timestamps correctly.
Example:
SELECT CAST('2024-01-01 14:00:00.123456' AS TIMESTAMP)
AS nanosecond_timestamp;
Explanation:
Since BigQuery only supports microsecond precision (6 decimal places for seconds), attempting to cast a STRING with nanosecond precision (9 decimal places) will cause the conversion to fail, resulting in an error.
Using SAFE_CAST for the Same Analysis:
SELECT SAFE_CAST('2024-01-01 14:00:00.123456' AS TIMESTAMP)
AS nanosecond_timestamp;
Explanation:
Using SAFE_CAST ensures that, instead of failing, the conversion will return NULL, allowing the query to continue without interruption. This is useful when working with timestamp data that might have varying levels of precision, ensuring the query runs even if some values exceed the supported limit.
When using CAST and SAFE_CAST in BigQuery, errors can occur due to incompatible data types, invalid formats, or unexpected null values. Understanding these common issues and how to troubleshoot them can help ensure smoother data conversions and prevent query failures.
⚠️ Error:
In BigQuery, converting a STRING to DATETIME using CAST fails if the string has invalid formats, like "None" or "0.0". BigQuery lacks functions like TRY-CATCH or ISNUMERIC() to handle such errors.
✅ Solution:
Use SAFE_CAST instead. It returns NULL for invalid values, allowing the query to continue without errors, even with inconsistent formats.
⚠️ Error:
When using SAFE_CAST to convert JSON timestamps with nanosecond precision to TIMESTAMP, BigQuery, which supports only microsecond precision, returns NULL without an error. This can cause unnoticed data loss.
✅ Solution:
Ensure the source provides microsecond-precision timestamps. If not, monitor for unexpected NULL values after casting and establish clear data contracts to maintain consistent timestamp precision and format.
⚠️ Error:
In BigQuery, converting a STRING to a DATE using CAST fails with the error "Could not cast literal to type DATE" if the string isn't in the required YYYY-MM-DD format. Variations like different separators or time information cause the conversion to fail.
✅ Solution:
Ensure the string follows the YYYY-MM-DD format before using CAST. For non-standard formats, use PARSE_DATE to specify the correct format for successful conversion.
When working with data type conversions in BigQuery, following best practices for using CAST and SAFE_CAST functions ensures efficient query execution and data integrity. These practices help avoid errors, enhance performance, and enable precise handling of complex data transformations in large datasets.
Before performing any cast operation, verify that the data types you are converting between are compatible. Incompatible data types, such as trying to cast an invalid string to a DATE, can result in casting errors or unexpected outcomes.
Ensuring compatibility upfront helps prevent query failures, particularly in complex transformations involving multiple data types. Always review and validate your source data to minimize runtime errors.
Prior to casting, perform thorough data validation and cleansing to correct or remove any values that could cause the casting to fail. This includes checking for null values, invalid formats, and outliers that might disrupt the conversion process.
Ensuring clean, consistent data helps prevent query failures and guarantees smoother, more accurate conversions when using CAST or SAFE_CAST functions.
Be mindful of null values when casting, as they can cause errors or lead to incorrect outcomes. To handle nulls effectively, use functions like IFNULL, COALESCE, or SAFE_CAST.
These functions allow you to manage null values gracefully, ensuring your queries run smoothly and produce accurate results, especially during calculations or aggregations on converted data.
To optimize performance and reduce errors, avoid unnecessary casting. Selecting the correct data types during schema design helps minimize the need for conversions in your queries, which can also help manage BigQuery pricing.
Over-casting or frequent type conversions can slow down performance, especially with large datasets. Ensure that casting is only applied when absolutely necessary for data compatibility or specific transformations.
Use the EXTRACT function to pull specific components, such as the year, month, or day, from DATE or TIMESTAMP fields, and cast them only if necessary.
This approach helps maintain precision and clarity in your data while reducing unnecessary processing. By focusing on key data elements, you can simplify your queries and ensure efficient data handling.
When working with complex data, leverage BigQuery’s ARRAY and STRUCT data types to manage multiple values or nested structures efficiently. These types help organize and simplify queries involving hierarchical data.
Cast individual elements as needed within these structures to maintain data integrity and ensure smooth data manipulation, allowing for more flexible and manageable data transformations.
BigQuery offers a range of powerful functions to help users manage, analyze, and manipulate large datasets effectively.
By leveraging these BigQuery functions, you can better manage complex data and derive valuable insights for your business.
Simplify your SQL query creation process with OWOX BI SQL Copilot, a tool designed to help you generate SQL queries up to 50 times faster. Whether managing complex datasets, creating reports, or handling data transformations, this free tool simplifies the process with intelligent automation.
OWOX BI SQL Copilot enhances productivity by providing quick, accurate query generation, reducing the time spent on manual coding. Unlock the potential of your data with this powerful, easy-to-use tool.
CAST converts data types but throws an error if the conversion fails. SAFE_CAST returns NULL on failure, allowing the query to continue without errors. SAFE_CAST is ideal for handling unpredictable data, ensuring query stability even with invalid inputs.
To convert a STRING to a DATE, use CAST or SAFE_CAST. The string must follow the 'YYYY-MM-DD' format; otherwise, CAST will fail. SAFE_CAST returns NULL for invalid formats. Use PARSE_DATE for non-standard date formats.
Common errors include converting strings in invalid formats (e.g., invalid DATE or TIMESTAMP), casting incompatible data types (e.g., STRING to BOOL), or over-casting, which can degrade performance and cause query failures.
Use SAFE_CAST to prevent query failures and return NULL on invalid conversions. Ensure data follows the correct format and validate it beforehand. Proper schema design and data cleansing help avoid errors.
SAFE_CAST safely converts data types, returning NULL on failure instead of raising an error. It allows queries to continue uninterrupted, making it useful for handling inconsistent or unpredictable data.
Use EXTRACT to retrieve specific date or time components (e.g., year, month, day) instead of converting data types. EXTRACT is more efficient when you need only part of a DATE or TIMESTAMP.