Content
- Overview of CAST and SAFE_CAST Functions in BigQuery
- Different CAST Functions in BigQuery
- Practical Applications of CAST & SAFE_CAST Function
- Key Differences Between CAST and SAFE_CAST in BigQuery
- When to Use SAFE_CAST Instead of CAST
- Troubleshooting Common Errors While Working with CAST and SAFE_CAST
- Best Practices to Optimize Your Use of CAST and SAFE_CAST Functions
- Unleash the Power of BigQuery Functions
- Generate SQL Queries 50х Faster with OWOX BI SQL Copilot for FREE
How to Convert Data Types Using CAST & SAFE_CAST in BigQuery
Alyona Samovar, Senior Digital Analyst @ OWOX
Converting data types in BigQuery can sometimes lead to unexpected errors. A single mismatch has the potential to throw your entire query off track. By adopting the right approach, such as using CAST and SAFE_CAST functions, you can ensure seamless and error-free data conversions when working with a mixed dataset.
In this guide, we will explain the differences between CAST and SAFE_CAST, their syntax, and practical use cases. Mastering these functions will enhance your data workflows and optimize query performance, whether for reporting or integration.
Overview of CAST and SAFE_CAST Functions in BigQuery
The CAST and SAFE_CAST functions in BigQuery are essential tools for converting data types. These functions allow you to transform data formats like strings, dates, and integers, ensuring data consistency and accuracy. By understanding the differences between them, you can optimize queries, prevent errors, and improve data workflows.
What is CAST in BigQuery?
The CAST function in BigQuery allows you to convert data from one type to another, ensuring consistency when working with different data structures during operations or comparisons.
However, casting incompatible data structures can lead to runtime errors.
BigQuery Syntax of CAST Function
CAST(expression AS typename [format_clause])
Here:
- CAST: The function used to convert data types.
- expression: This is the value or column you want to convert. It can be a string, integer, date, etc.
- typename: This is the target data type you want to convert the expression into. BigQuery supports several types such as INT64, STRING, DATE, FLOAT64, BOOL, and more.
- [format_clause]: In some cases, you can include a format clause to specify how the conversion should happen. For example, converting bytes to a specific encoding like BASE64 or UTF-8. The format clause depends on the type of data you're casting.
Example of CAST
The following query will return the conversion of a string value to a numeric value.
CAST("123" AS INT64)
Here:
- “123”: This is a string value containing numeric characters that can be converted to an integer.
- CAST: The function used to convert the result of the String expression into a different data type.
- AS INT64: This specifies the target data type. Here, the string "123" is converted to the INT64 type, returning the numeric value 123.
- NULL Handling: If the value is NULL instead of the string "123", the expression will return NULL since CAST cannot convert NULL to a numeric type.
Simplify BigQuery Reporting in Sheets
Easily analyze corporate data directly into Google Sheets. Query, run, and automatically update reports aligned with your business needs
What is SAFE_CAST in BigQuery?
The SAFE_CAST function in BigQuery safely converts a value from one data type to another, returning NULL instead of raising an error if the conversion fails. This makes it ideal for handling potential conversion errors without disrupting the entire query.
SAFE_CAST is useful when working with uncertain data types, ensuring that your queries remain robust and error-free. It supports all valid data types in BigQuery and provides a safe alternative to regular CAST by gracefully handling conversion failures.
Syntax of SAFE_CAST
SAFE_CAST(expression AS typename [format_clause])
Here:
- SAFE_CAST: This function performs the same conversion as the CAST function but with one major difference - it handles failed conversions gracefully by returning NULL instead of an error.
- expression: The value or field you want to convert. This can be a string, integer, date, etc. For example, '123' or 'invalid-date'.
- typename: This is the target data type you want the expression converted to. Common types include INT64, STRING, DATE, FLOAT64, TIMESTAMP, and more.
- [format_clause]: In some cases, you can provide a format clause that guides how the conversion should be carried out. This is used when casting data such as bytes, where the format could be BASE64 or UTF-8. It’s specific to certain types of conversions.
Example of SAFE_CAST
You're handling a dataset with numeric IDs and text errors. Using SAFE_CAST converts values safely, returning NULL for invalid entries like "apple" without breaking the query.
SELECT SAFE_CAST("apple" AS INT64) AS not_a_number;
Here:
- CAST("apple" AS INT64): This tries to convert the string "apple" to an integer. Since "apple" is not a number, this query will fail, throwing an error.
- SAFE_CAST("apple" AS INT64): This safely attempts the conversion. Since "apple" cannot be converted to an integer, it returns NULL instead of causing an error.
- NULL Handling: SAFE_CAST ensures the query still runs even when the conversion is invalid by returning NULL for non-convertible values.
- Static Analysis: Even though SAFE_CAST prevents runtime errors, if you attempt to cast between types that are completely incompatible (like casting a string to an array), BigQuery's static analysis will still produce an error because the query itself is invalid.
Different CAST Functions in BigQuery
- CAST AS ARRAY: Converts data into an array, allowing you to handle multiple elements as a single collection, such as splitting strings or handling multiple values in one field.
- CAST AS BIGNUMERIC: Converts values into the BIGNUMERIC type, designed for handling extremely large numbers with high precision, often used for complex financial calculations or scientific data.
- CAST AS BOOL: Converts values into a boolean (TRUE or FALSE), typically used to represent the result of logical conditions or comparisons in queries.
- CAST AS BYTES: Converts data into a raw byte format, useful for handling binary data such as encoded files, images, or cryptographic keys.
- CAST AS DATE: Converts values into a DATE format, enabling easy manipulation and querying of date-specific information, such as filtering by day, month, or year.
- CAST AS DATETIME: Converts data into a DATETIME format, which includes both date and time components, useful for timestamped entries without timezone adjustments.
- CAST AS FLOAT64: Converts data into the FLOAT64 type, which represents floating-point numbers, allowing for high-precision mathematical calculations with decimal points.
- CAST AS INT64: Converts values into the INT64 type, used for handling large integer numbers, particularly in data that requires precise whole-number calculations.
- CAST AS NUMERIC: Converts data into the NUMERIC type, providing fixed-point precision for financial or scientific applications where exact decimal representation is critical.
- CAST AS INTERVAL: Converts values into the INTERVAL type, which represents a span of time, useful for calculating durations between dates or times.
- CAST AS STRING: Converts data into a STRING type, used for text-based data, allowing you to manipulate, search, and compare text values in queries.
- CAST AS STRUCT: Converts values into a structured format, with fields and subfields, useful for organizing complex data relationships within a single column.
- CAST AS TIME: Converts data into a TIME format for time-specific values, useful for tracking hours and minutes independently of a date.
- CAST AS TIMESTAMP: Converts values into a TIMESTAMP, which represents an exact moment in time, including date and time, often used for event tracking or scheduling.
- CAST AS RANGE: Converts values into a RANGE format, representing a continuous span between two endpoints, useful for working with numeric or date ranges in queries.
Unlock BigQuery Insights in Google Sheets
Report on what matters to you. Integrate corporate BigQuery data into a familiar spreadsheet interface. Get insightful, up-to-date reports with just a few clicks
Practical Applications of CAST & SAFE_CAST Function
The CAST function in BigQuery makes it easier to work with mixed or inconsistent data formats. Following are a few practical examples of using CAST for data manipulation for analysis.
Transforming Strings to Timestamps
The CAST function in BigQuery allows you to convert STRING values into TIMESTAMP format.
Example:
Suppose, your dataset includes transaction times in string format. You can convert these strings to TIMESTAMP for time-based analysis, like calculating the time between transactions.
SELECT
name,
CAST(CONCAT(transaction_date, ' ', transaction_time) AS TIMESTAMP) AS transaction_timestamp
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id IN (101, 102, 103);
Here:
- CONCAT(transaction_date, ' ', transaction_time): Combines the DATE (2023-05-12, 2023-04-23, 2023-06-15) and TIME (15:35:00, 10:45:30, 11:15:45) fields into a single string representing the full timestamp.
- CAST(... AS TIMESTAMP): Converts the string into a TIMESTAMP format.
The query returns the name and a combined transaction_timestamp (converted from transaction_date and transaction_time) for customers with customer_id 101, 102, and 103.
Converting an Integer to a Float
The CAST function in BigQuery allows you to convert an INTEGER into a FLOAT64 format. This conversion is useful for enabling precise calculations that require decimal values.
Example:
Suppose you need to convert revenue (INT64) to FLOAT64 to calculate averages or use it in math operations requiring decimal numbers. You can use CAST in the following manner.
SELECT
customer_id,
CAST(revenue AS FLOAT64) AS revenue_id_float
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id IN (101, 102, 103);
Here:
- CAST(revenue AS FLOAT64): Converts the INT64 value of revenue to FLOAT64 for better compatibility in floating-point operations.
Converting INT64 to BOOL
Converting INT64 to BOOL is useful when checking whether a number is zero or non-zero. It helps simplify conditions, like flagging values as true (non-zero) or false (zero).
Example:
You want to convert the age column to BOOL for analysis purposes, where any age over 35 returns TRUE (i.e., middle-aged customers).
SELECT name, age,
CAST(age > 35 AS BOOL)
AS is_middle_aged FROM `owox-analytics.dataset.customer_transactions`
WHERE customer_id IN (101, 102, 103);
Here:
- CAST(age > 35 AS BOOL): Converts the logical expression (age > 35) into a BOOL value where TRUE means the customer is older than 35 and FALSE otherwise.
The query returns false because Bob's age is 25, less than 35.
Converting STRING to BOOL
When casting from STRING to BOOL in BigQuery, the conversion follows these rules: 'true' -> TRUE, 'false' -> FALSE, and any other value returns NULL.
Example:
You have STRING data in the active_member field and want to convert it to BOOL for boolean comparisons in your queries.
SELECT
name,
SAFE_CAST(active_member AS BOOL) AS active_status
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id IN (101, 102, 103);
Here:
- SAFE_CAST(active_member AS BOOL): Converts the string "TRUE" to the boolean value TRUE. If the string is invalid, SAFE_CAST returns NULL instead of raising an error.
The query returns true because Bob, Alice and Charlie work status are active.
Converting Scientific Notation into Decimal Format
Converting from scientific notation to NUMERIC changes values into exact decimal numbers, improving calculation accuracy.
Example:
Your dataset has a balance field with large amounts in scientific notation, and you need to change them to a regular decimal format for financial reports.
SELECT
balance, CAST(balance AS NUMERIC) AS balance_numeric
FROM
`owox-analytics.dataset.financial_data`
WHERE
account_name = 'Account A';
Here:
- CAST(balance AS NUMERIC): Converts the FLOAT64 balance in scientific notation into a NUMERIC format for precise decimal handling.
Formatting BYTES as STRING
The CAST function in BigQuery converts BYTES to STRING. With the FORMAT clause, you can choose the encoding (e.g., ASCII, UTF-8) to make byte sequences readable and easier to work with.
Example:
The raw_data field contains byte-encoded values that need to be converted into STRING for readability.
SELECT
raw_data,
CAST(raw_data AS STRING FORMAT 'HEX') AS decoded_data
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id = 101;
Here:
- CAST(raw_data AS STRING FORMAT 'HEX'): Converts the BYTES value to a STRING using the HEX format, which turns encoded data into a human-readable string.
Formatting STRING as BYTES
By specifying a FORMAT clause, you can choose the encoding (e.g., ASCII or UTF-8) for converting the string into its byte representation, making it suitable for storage or further processing.
Example:
You want to store string data as bytes in the raw_data column for more efficient storage or processing.
SELECT
name,
CAST(name AS BYTES FORMAT 'UTF-8') AS name_bytes
FROM
`owox-analytics.dataset.customer_transactions`
WHERE
customer_id = 102;
Here:
- CAST(name AS BYTES FORMAT 'UTF-8'): Converts the STRING value into BYTES format using UTF-8 encoding, which is commonly used for text storage.
Get BigQuery Reports in Seconds
Seamlessly generate and update reports in Google Sheets—no complex setup needed
Key Differences Between CAST and SAFE_CAST in BigQuery
Most data types in BigQuery can be converted using Conversion Functions like the CAST function but will fail and stop the query if the conversion is invalid.
On the other hand, SAFE_CAST handles failed conversions by returning NULL instead of an error, ensuring the query runs without interruption. This key difference makes SAFE_CAST more reliable when working with data that might not always convert cleanly.
When to Use SAFE_CAST Instead of CAST
When working with data conversions in BigQuery, choosing between CAST and SAFE_CAST is necessary when dealing with unpredictable or mixed data types.
Converting a String to a Date
The expression to be converted must follow a recognizable DATE format, such as 'YYYY-MM-DD', or be a valid TIMESTAMP or DATETIME, where the time component will be discarded. If the expression is a string that doesn't match a valid DATE format, the conversion will fail, potentially causing an error unless SAFE_CAST is used.
Using CAST (Error on Failure):
The CAST function will raise an error if the conversion fails due to an invalid format or incompatible data types.
Example:
In this example, attempting to convert an invalid date format from the transaction_date column will raise an error.
SELECT CAST(transaction_date AS DATE)
AS cast_transaction_date
FROM `owox-analytics.dataset.customer_transactions`;
Explanation:
If transaction_date contains an invalid format like "02/30/2024", the query will fail because CAST expects a valid date in the format YYYY-MM-DD. Invalid data will result in an error, stopping the query.
Using SAFE_CAST (Returns NULL on Failure):
The SAFE_CAST function will return NULL if the conversion fails, allowing the query to proceed without interruption.
Example:
SELECT SAFE_CAST(transaction_date AS DATE)
AS safe_cast_transaction_date
FROM `owox-analytics.dataset.customer_transactions`;
Explanation:
If transaction_date contains invalid data like "02/30/2024", SAFE_CAST will return NULL instead of causing the query to fail. This ensures the query runs smoothly even when there are conversion issues.
Converting JSON Data Type into Timestamps
In some cases, source systems provide JSON events with timestamps in microsecond precision (e.g., 2024-01-01 14:00:00.123456), which can be successfully cast to a TIMESTAMP in BigQuery using SAFE_CAST.
However, if the source system suddenly sends timestamps in nanosecond precision, the SAFE_CAST function will fail to convert the timestamp properly since BigQuery's TIMESTAMP only supports microsecond precision. This failure won't raise an error but will instead return NULL, potentially going unnoticed without proper monitoring.
Using CAST (Error on Failure):
The CAST function will fail if an incompatible or invalid timestamp format is provided, but it may still convert valid microsecond timestamps correctly.
Example:
SELECT CAST('2024-01-01 14:00:00.123456' AS TIMESTAMP)
AS nanosecond_timestamp;
Explanation:
Since BigQuery only supports microsecond precision (6 decimal places for seconds), attempting to cast a STRING with nanosecond precision (9 decimal places) will cause the conversion to fail, resulting in an error.
Using SAFE_CAST for the Same Analysis:
SELECT SAFE_CAST('2024-01-01 14:00:00.123456' AS TIMESTAMP)
AS nanosecond_timestamp;
Explanation:
Using SAFE_CAST ensures that, instead of failing, the conversion will return NULL, allowing the query to continue without interruption. This is useful when working with timestamp data that might have varying levels of precision, ensuring the query runs even if some values exceed the supported limit.
Troubleshooting Common Errors While Working with CAST and SAFE_CAST
When using CAST and SAFE_CAST in BigQuery, errors can occur due to incompatible data types, invalid formats, or unexpected null values. Understanding these common issues and how to troubleshoot them can help ensure smoother data conversions and prevent query failures.
Handling Invalid Formats When Converting STRING to DATETIME
⚠️ Error:
In BigQuery, converting a STRING to DATETIME using CAST fails if the string has invalid formats, like "None" or "0.0". BigQuery lacks functions like TRY-CATCH or ISNUMERIC() to handle such errors.
✅ Solution:
Use SAFE_CAST instead. It returns NULL for invalid values, allowing the query to continue without errors, even with inconsistent formats.
Handling Precision Errors with JSON Timestamps
⚠️ Error:
When using SAFE_CAST to convert JSON timestamps with nanosecond precision to TIMESTAMP, BigQuery, which supports only microsecond precision, returns NULL without an error. This can cause unnoticed data loss.
✅ Solution:
Ensure the source provides microsecond-precision timestamps. If not, monitor for unexpected NULL values after casting and establish clear data contracts to maintain consistent timestamp precision and format.
"Could Not Cast Literal to Type DATE" Error in BigQuery
⚠️ Error:
In BigQuery, converting a STRING to a DATE using CAST fails with the error "Could not cast literal to type DATE" if the string isn't in the required YYYY-MM-DD format. Variations like different separators or time information cause the conversion to fail.
✅ Solution:
Ensure the string follows the YYYY-MM-DD format before using CAST. For non-standard formats, use PARSE_DATE to specify the correct format for successful conversion.
Simplify BigQuery Reporting in Sheets
Easily analyze corporate data directly into Google Sheets. Query, run, and automatically update reports aligned with your business needs
Best Practices to Optimize Your Use of CAST and SAFE_CAST Functions
When working with data type conversions in BigQuery, following best practices for using CAST and SAFE_CAST functions ensures efficient query execution and data integrity. These practices help avoid errors, enhance performance, and enable precise handling of complex data transformations in large datasets.
Ensuring Data Type Compatibility
Before performing any cast operation, verify that the data types you are converting between are compatible. Incompatible data types, such as trying to cast an invalid string to a DATE, can result in casting errors or unexpected outcomes.
Ensuring compatibility upfront helps prevent query failures, particularly in complex transformations involving multiple data types. Always review and validate your source data to minimize runtime errors.
Validating and Cleansing Data
Prior to casting, perform thorough data validation and cleansing to correct or remove any values that could cause the casting to fail. This includes checking for null values, invalid formats, and outliers that might disrupt the conversion process.
Ensuring clean, consistent data helps prevent query failures and guarantees smoother, more accurate conversions when using CAST or SAFE_CAST functions.
Handling Null Values Properly
Be mindful of null values when casting, as they can cause errors or lead to incorrect outcomes. To handle nulls effectively, use functions like IFNULL, COALESCE, or SAFE_CAST.
These functions allow you to manage null values gracefully, ensuring your queries run smoothly and produce accurate results, especially during calculations or aggregations on converted data.
Minimizing Unnecessary Casts
To optimize performance and reduce errors, avoid unnecessary casting. Selecting the correct data types during schema design helps minimize the need for conversions in your queries, which can also help manage BigQuery pricing.
Over-casting or frequent type conversions can slow down performance, especially with large datasets. Ensure that casting is only applied when absolutely necessary for data compatibility or specific transformations.
Extracting Specific Data Components
Use the EXTRACT function to pull specific components, such as the year, month, or day, from DATE or TIMESTAMP fields, and cast them only if necessary.
This approach helps maintain precision and clarity in your data while reducing unnecessary processing. By focusing on key data elements, you can simplify your queries and ensure efficient data handling.
Dive deeper with this read
ChatGPT for SQL: How to Generate Queries Faster with AI
Utilizing ARRAY and STRUCT Types
When working with complex data, leverage BigQuery’s ARRAY and STRUCT data types to manage multiple values or nested structures efficiently. These types help organize and simplify queries involving hierarchical data.
Cast individual elements as needed within these structures to maintain data integrity and ensure smooth data manipulation, allowing for more flexible and manageable data transformations.
Unleash the Power of BigQuery Functions
BigQuery offers a range of powerful functions to help users manage, analyze, and manipulate large datasets effectively.
- Conditional Expressions: Use functions like IF, CASE, and COALESCE to evaluate conditions and take actions based on specific criteria.
- String Functions: Modify text data with functions such as CONCAT, SUBSTR, and REPLACE to format or extract parts of strings.
- Navigation Functions: Analyze ordered data with LEAD, LAG, FIRST_VALUE, and LAST_VALUE to access previous or next rows.
- Statistical Aggregate Functions: Calculate statistics like averages, sums, counts, and maximums using AVG, SUM, COUNT, and MAX.
- Date Functions: Work with date-related operations using DATE_DIFF, EXTRACT, and FORMAT_DATE.
- Window Functions: Perform row-level calculations with ROW_NUMBER, RANK, and NTILE without collapsing results.
- DDL Functions: Manage database structures using CREATE, ALTER, and DROP commands.
By leveraging these BigQuery functions, you can better manage complex data and derive valuable insights for your business.
Generate SQL Queries 50х Faster with OWOX BI SQL Copilot for FREE
Simplify your SQL query creation process with OWOX BI SQL Copilot, a tool designed to help you generate SQL queries up to 50 times faster. Whether managing complex datasets, creating reports, or handling data transformations, this free tool simplifies the process with intelligent automation.
OWOX BI SQL Copilot enhances productivity by providing quick, accurate query generation, reducing the time spent on manual coding. Unlock the potential of your data with this powerful, easy-to-use tool.
Give Your Data the AI SuperPower
Get actionable SQL queries for great data analysis with OWOX BI SQL Copilot
FAQ
-
What is the difference between CAST and SAFE_CAST in BigQuery?
CAST converts data types but throws an error if the conversion fails. SAFE_CAST returns NULL on failure, allowing the query to continue without errors. SAFE_CAST is ideal for handling unpredictable data, ensuring query stability even with invalid inputs.
-
How do I convert a string to a date in BigQuery?
To convert a STRING to a DATE, use CAST or SAFE_CAST. The string must follow the 'YYYY-MM-DD' format; otherwise, CAST will fail. SAFE_CAST returns NULL for invalid formats. Use PARSE_DATE for non-standard date formats.
-
What are some common errors when using CAST in BigQuery?
Common errors include converting strings in invalid formats (e.g., invalid DATE or TIMESTAMP), casting incompatible data types (e.g., STRING to BOOL), or over-casting, which can degrade performance and cause query failures.
-
How can I avoid errors when converting data types in BigQuery?
Use SAFE_CAST to prevent query failures and return NULL on invalid conversions. Ensure data follows the correct format and validate it beforehand. Proper schema design and data cleansing help avoid errors.
-
What is the purpose of the SAFE_CAST function in BigQuery?
SAFE_CAST safely converts data types, returning NULL on failure instead of raising an error. It allows queries to continue uninterrupted, making it useful for handling inconsistent or unpredictable data.
-
When should I use the EXTRACT function instead of CAST in BigQuery?
Use EXTRACT to retrieve specific date or time components (e.g., year, month, day) instead of converting data types. EXTRACT is more efficient when you need only part of a DATE or TIMESTAMP.