Content
- Introduction to DDL Statements in BigQuery
- Exploring CREATE Statements in Data Definition Language (DDL)
- Mastering DROP Statements to Manage BigQuery Assets
- Advanced Strategies for Table Management in BigQuery
- Best Practices for Using DDL Statements in BigQuery
- Troubleshooting Common DDL Errors When Working with Tables
- Learn More About Data Definition Language
- Expand Your Knowledge with Additional BigQuery Functions
- Enhance Your Data Analysis with OWOX BI BigQuery Reports Extension
Creating and Deleting BigQuery Tables: Data Definition Language (DDL) Essentials
Anna Panchenko, Senior Digital Analyst @ OWOX
Denis Lisogorja, Analytical Consultants, Team Lead @ OWOX
Vadym Kramarenko, Senior Growth Marketing Manager @ OWOX
Data Definition Language (DDL) statements in BigQuery are essential for creating, deleting and modifying tables within a data warehouse environment. These statements allow users to efficiently manage database schema without affecting the data itself.
Using DDL, users can create new tables, alter existing table structures, and define table schemas, enabling precise control over how data is stored, accessed, and managed.
This article exclusively covers the CREATE and DROP DDL statements in BigQuery, detailing the processes involved in creating new tables and deleting existing ones.
If you're interested in learning how to modify existing table structures to adapt to evolving data needs, we invite you to explore our in-depth article that dives into modifying tables using ALTER statements.
Introduction to DDL Statements in BigQuery
Data Definition Language (DDL) statements help in defining and managing the structure of your database objects in BigQuery.
These statements allow you to create, modify, and delete schemas, tables, views, and more.
Using DDL statements efficiently can help you maintain organized, scalable, and high-performing databases. Whether you are a data engineer, SQL developer, or data analyst, mastering DDL statements is crucial for optimizing data workflows and ensuring data integrity.
Exploring CREATE Statements in Data Definition Language (DDL)
CREATE statements in BigQuery’s Data Definition Language (DDL) are pivotal for establishing various database objects that form the backbone of your data architecture.
These statements allow you to create schemas, tables, views, and other essential objects, each serving a unique purpose in organizing and managing your data.
CREATE SCHEMA Statement
The CREATE SCHEMA statement in BigQuery is used to create a new table schema within your dataset.
A schema organizes your tables and other database objects, making your data easier to manage and access. It helps group related data, improves organization and security by setting access controls at the schema level.
💡 IMPORTANT: In this SQL statement, the term "schema" is used to describe a logical grouping of tables, views, and other resources. Within BigQuery, the equivalent concept is known as a dataset. Here, "schema" does not refer to BigQuery's table schemas.
Syntax of CREATE SCHEMA Statement
CREATE SCHEMA [ IF NOT EXISTS ]
[project_name].dataset_name
[DEFAULT COLLATE collate_specification]
[OPTIONS(schema_option_list)]
- CREATE SCHEMA: The command to create a new schema.
- IF NOT EXISTS: Optional clause to avoid errors if the schema already exists.
- project_name.dataset_name: Specifies the project and dataset name for the schema.
- DEFAULT COLLATE: Optional clause to set the default collation for the schema.
- OPTIONS: Optional clause to specify schema options like descriptions, labels, and other settings.
- schema_option_list: A list of options to customize the schema, such as location and default table expiration.
Example of CREATE SCHEMA Statement
This example showcases the creation of a new schema named mydataset in BigQuery, configured with specific geographical location, default table expiration settings, and descriptive labels to enhance data management and accessibility within the organization.
CREATE SCHEMA mydataset
OPTIONS(
location="us",
default_table_expiration_days=10,
labels=[("label1","value1"),("label2","value2")]
);
Here:
- CREATE SCHEMA mydataset: Creates a schema named mydataset.
- OPTIONS: Specifies additional settings for the schema.
- location="us": Sets the geographic location of the schema to the US.
- default_table_expiration_days=10: Sets the default table expiration to days.
- labels=[("label1","value1"),("label2","value2")]: Adds labels to the schema for better identification and management.
CREATE TABLE Statement
The CREATE TABLE statement in BigQuery is used to create a new table within a dataset. This statement allows you to define the table's structure by specifying column names, data types, and other properties. Creating tables is fundamental for organizing and storing your data in a structured format. Tables can be used to store raw data, processed data, and any other type of structured information.
Syntax of CREATE TABLE Statement
CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ]
table_name
[(
column | constraint_definition[, ...]
)]
[DEFAULT COLLATE collate_specification]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(table_option_list)]
[AS query_statement]
column:=
column_definition
constraint_definition:=
[primary_key]
| [[CONSTRAINT constraint_name] foreign_key, ...]
primary_key :=
PRIMARY KEY (column_name[, ...]) NOT ENFORCED
foreign_key :=
FOREIGN KEY (column_name[, ...]) foreign_reference
foreign_reference :=
REFERENCES primary_key_table(column_name[, ...]) NOT ENFORCED
- CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE: Command to create a new table, optionally replacing an existing one or creating a temporary table.
- IF NOT EXISTS: Optional clause to avoid errors if the table already exists.
- table_name: The name of the new table.
- column | constraint_definition: Defines the columns and any constraints for the table.
- DEFAULT COLLATE: Optional clause to set the default collation for the table.
- PARTITION BY: Optional clause to partition the table by a specific expression.
- CLUSTER BY: Optional clause to cluster the table by specified columns.
- OPTIONS: Optional clause to specify table options like descriptions, labels, and expiration times.
- AS query_statement: Optional clause to create a table based on the result of a query.
Example of CREATE TABLE Statement
This example illustrates the creation of a partitioned table in a BigQuery environment, where detailed inventory data including product IDs and categories are stored with structured and descriptive metadata, employing partitioning and expiration settings to optimize data management and query performance.
CREATE TABLE myDataset.product_categories
(
product_id INT64 OPTIONS(
description="An optional INTEGER field"),
details STRUCT<
categories ARRAY OPTIONS(
description="A repeated STRING field"),
in_stock BOOL
>
)
PARTITION BY _PARTITIONDATE
OPTIONS(
expiration_timestamp=
TIMESTAMP "2025-12-31 00:00:00 UTC",
partition_expiration_days=1,
description="A table that expires at the end of 2025, with each partition living for 24 hours",
labels=[("product", "category_management")]
)
Here:
CREATE TABLE myDataset.product_categories: Creates a table named product_categories in the specified dataset.
product_id INT64 OPTIONS(description="An optional INTEGER field"): Defines a product_id column with INT64 data type and a description.
details STRUCT<categories ARRAY<STRING> OPTIONS(description="A repeated STRING field"), in_stock BOOL>: Defines a details column as a STRUCT with nested fields, including an array of strings (categories) and a boolean (in_stock).
expiration_timestamp=TIMESTAMP "2025-12-31 00:00:00 UTC": Sets the table to expire at the end of 2025.
partition_expiration_days=1: Sets each partition to expire after 24 hours.
labels=[("product", "category_management")]: Adds labels to the table for better identification and management.
💡 Delve into the intricacies of Timestamp Functions in our comprehensive guide! This resource unpacks how to master timestamp data types for exact time calculations and formatting in BigQuery. Discover the functionalities of key functions, complete with practical examples and customizable templates to boost your command over time-related data analysis.
Dive deeper with this read
Navigating BigQuery Timestamp Functions: An Insightful Guide
CREATE TABLE LIKE Statement
The CREATE TABLE LIKE statement in BigQuery is used to create a new table that has the same schema as an existing table. This statement is useful when you want to replicate the structure of a table without copying its data. It simplifies the process of creating tables with identical schemas, saving time and ensuring consistency.
Syntax of CREATE TABLE LIKE Statement
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ]
table_name
LIKE [[project_name.]dataset_name.]source_table_name
...
[OPTIONS(table_option_list)]
- CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ]: Command to create a new table, optionally avoiding errors if the table already exists.
- table_name: The name of the new table to be created.
- LIKE [[project_name.]dataset_name.]source_table_name: Specifies that the new table should have the same schema as the existing table.
- OPTIONS: Optional clause to specify additional table options like descriptions and labels.
- table_option_list: A list of options to customize the new table.
Example of CREATE TABLE LIKE Statement
The following example creates a new table named sales_report in business_data with the same schema as sales_data and the data from the SELECT statement:
CREATE TABLE myDataset.sales_report
LIKE myDataset.sales_data
AS SELECT * FROM `owox-analytics.myDataset.sales_data
Here:
- CREATE TABLE myDataset.sales_report: Creates a new table named sales_report in the myDataset dataset.
- LIKE myDataset.sales_data: Indicates that the new table should have the same schema as sales_data.
- AS SELECT * FROM owox-analytics.myDataset.sales_data: Populates the new table with data from the sales_data table.
CREATE TABLE COPY Statement
The CREATE TABLE COPY statement in BigQuery is used to create a new table by copying the schema and data from an existing table. This is useful when you need to duplicate data for backup, testing, or analysis purposes. By copying the table, you can ensure that the new table has the same structure and data as the original, making it a straightforward and efficient way to manage and replicate data.
Syntax of CREATE TABLE COPY Statement
CREATE [ OR REPLACE ]
TABLE [ IF NOT EXISTS ] table_name
COPY source_table_name
...
[OPTIONS(table_option_list)]
- CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ]: Command to create a new table, optionally replacing an existing one or avoiding errors if the table already exists.
- table_name: The name of the new table to be created.
- COPY source_table_name: Specifies that the new table should be a copy of the existing table.
- OPTIONS: Optional clause to specify additional table options like descriptions and labels.
- table_option_list: A list of options to customize the new table.
Example of CREATE TABLE COPY Statement
This example demonstrates how to efficiently create a new table in SQL by copying the schema from an existing table, populating it with data from another table, and setting additional metadata options like expiration date, description, and labels for enhanced table management and identification.
CREATE TABLE myDataset.employee_contacts_copy
COPY `owox-analytics.myDataset.employee_contacts`
Here:
- CREATE TABLE myDataset.employee_contacts_copy
- : Creates a new table named employee_contacts_copy in the myDatasetdataset.
- COPY owox-analytics.myDataset.employee_contacts: Indicates that the new table should be a copy of the employee_contacts table.
CREATE EXTERNAL TABLE Statement
The CREATE EXTERNAL TABLE statement in BigQuery is used to create a table that references data stored outside of BigQuery, such as in Google Cloud Storage or another external data source. This is useful when you need to query data without importing it into BigQuery, saving on storage costs and allowing for more flexible data management.
Syntax of CREATE EXTERNAL TABLE Statement
CREATE [ OR REPLACE ]
EXTERNAL TABLE [ IF NOT EXISTS ] table_name
[(
column_name column_schema,
...
)]
[WITH CONNECTION connection_name]
[WITH PARTITION COLUMNS
[(
partition_column_name partition_column_type,
...
)]
]
OPTIONS (
external_table_option_list,
...
);
- CREATE [ OR REPLACE ] EXTERNAL TABLE [ IF NOT EXISTS ]: Command to create a new external table, optionally replacing an existing one or avoiding errors if the table already exists.
- column_name column_schema: Defines the columns and their schemas for the table.
- WITH CONNECTION connection_name: Specifies the connection to the external data source.
- WITH PARTITION COLUMNS: (Optional) Defines partition columns and their types.
- OPTIONS: Specifies the options for the external table, such as format, staleness, and cache settings.
- external_table_option_list: A list of options to customize the external table.
Example of CREATE EXTERNAL TABLE Statement
This example illustrates how to create or replace an external table named product_inventory within the myDataset dataset in BigQuery, connecting to an external data source and configuring it with specific options like data format, field delimiter and sets the maximum number of bad records allowed.
CREATE EXTERNAL TABLE
myDataset.product_inventory (
product_id INT64,
product_name STRING,
in_stock BOOL
)
OPTIONS(
format = "CSV",
uris = ['gs://analytics/test'],
field_delimiter = '|',
max_bad_records = 5);
Here:
- CREATE OR REPLACE EXTERNAL TABLE myDataset.product_inventory: Creates or replaces an external table named product_inventory in the myDataset dataset.
- product_id INT64, product_name STRING, in_stock BOOL: Defines the columns for the table with their respective data types.
- OPTIONS: Specifies additional settings for the external table.
- format = "CSV": Indicates that the data format is CSV.
- uris = ['gs://analytics/test']: URIs for the external data locations in Cloud Storage.
- field_delimiter = '|' : The separator '|' for fields in the CSV file.
- max_bad_records = 5: The 5 bad records to ignore when reading the data.
CREATE SNAPSHOT TABLE Statement
The CREATE SNAPSHOT TABLE statement in BigQuery allows you to create a new table that captures the state of another table at a specific point in time. This is useful for preserving historical data, creating backups, or auditing changes. Snapshot tables provide a way to analyze data as it existed at a particular moment, without affecting the original table.
Syntax of CREATE SNAPSHOT TABLE Statement
CREATE SNAPSHOT TABLE [ IF NOT EXISTS ]
table_snapshot_name
CLONE source_table_name
[FOR SYSTEM TIME AS OF time_expression]
[OPTIONS(snapshot_option_list)]
- CREATE SNAPSHOT TABLE [ IF NOT EXISTS ]: Command to create a new snapshot table, optionally avoiding errors if the table already exists.
- table_snapshot_name: The name of the new snapshot table.
- CLONE source_table_name: Specifies the source table to clone.
- FOR SYSTEM TIME AS OF time_expression: Optional clause to specify the exact point in time for the snapshot.
- OPTIONS: Optional clause to specify additional options for the snapshot table.
- snapshot_option_list: A list of options to customize the snapshot table.
Examples of CREATE SNAPSHOT TABLE Statement
There are 2 ways to use the CREATE SNAPSHOT TABLE statement, following are the two examples demonstrating them.
Example 1:
The following example creates a table snapshot of the table owox-analytics.myDataset.transactions. The table snapshot is created in the dataset myDataset and is named transactions_snapshot:
CREATE SNAPSHOT TABLE `owox-analytics.myDataset.transactions_snapshot`
CLONE `owox-analytics.myDataset.transactions`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 72 HOUR),
friendly_name="transactions_snapshot",
description="A table snapshot that expires in 3 days",
labels=[("department", "finance"), ("purpose", "archive")]
);
Here:
- CREATE SNAPSHOT TABLE owox-analytics.myDataset.transactions_snapshot: Creates a new snapshot table named transactions_snapshot in the archive dataset.
- CLONE owox-analytics.myDataset.transactions: Indicates that the new snapshot table should clone the transactions table.
- OPTIONS: Specifies additional settings for the new table.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 72 HOUR): Sets the snapshot table to expire in 3 days.
- friendly_name="transactions_snapshot": Provides a user-friendly name for the snapshot table.
- description="A table snapshot that expires in 3 days": Provides a description of the snapshot table.
- labels=[("department", "finance"), ("purpose", "archive")]: Adds labels to the table for better identification and management.
Example 2 :
The following example creates a table snapshot of the table owox-analytics.myDataset.sales_data. The table snapshot is created in the dataset archive and is named sales_data_snapshot:
CREATE SNAPSHOT TABLE IF NOT EXISTS `owox-analytics.myDataset.sales_data_snapshot`
CLONE `owox-analytics.myDataset.sales_data`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 72 HOUR),
friendly_name="transactions_snapshot",
description="A table snapshot that expires in 3 days",
labels=[("department", "finance"), ("purpose", "archive")]
);
Here:
- CREATE SNAPSHOT TABLE IF NOT EXISTS owox-analytics.myDataset.sales_data_snapshot: Creates a new snapshot table named transactions_snapshot in the archive dataset if it does not already exist.
- CLONE owox-analytics.myDataset.sales_data: Indicates that the new snapshot table should clone the transactions table.
- OPTIONS: Specifies additional settings for the new table.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 72 HOUR): Sets the snapshot table to expire in 3 days.
- friendly_name="transactions_snapshot": Provides a user-friendly name for the snapshot table.
- description="A table snapshot that expires in 3 days": Provides a description of the snapshot table.
- labels=[("department", "finance"), ("purpose", "archive")]: Adds labels to the table for better identification and management.
CREATE TABLE CLONE Statement
The CREATE TABLE CLONE statement in BigQuery allows you to create a new table that is a clone of an existing table, meaning the new table has the same schema and data as the original table at the time of the clone. This is useful for scenarios such as testing, backup, or creating a new environment based on existing data without duplicating the storage costs.
Syntax of CREATE TABLE CLONE Statement
CREATE TABLE [ IF NOT EXISTS ]
destination_table_name
CLONE source_table_name [
FOR SYSTEM TIME AS OF time_expression]
...
[OPTIONS(table_option_list)]
- CREATE TABLE [ IF NOT EXISTS ]: Command to create a new table, optionally avoiding errors if the table already exists.
- destination_table_name: The name of the new table to be created.
- CLONE source_table_name: Specifies the source table to clone.
- FOR SYSTEM TIME AS OF time_expression: Optional clause to specify the exact point in time for the snapshot.
- OPTIONS: Optional clause to specify additional options for the cloned table.
- table_option_list: A list of options to customize the cloned table.
Examples of CREATE TABLE CLONE Statement
Following are the examples that demonstrate how the function can be used in 2 ways.
Example 1:
The following example creates the table myDataset.sales_data_clone from the table snapshot myDataset.sales_data_snapshot.
CREATE TABLE
`owox-analytics.myDataset.sales_data_clone`
CLONE
`owox-analytics.myDataset.sales_data_snapshot`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 365 DAY),
friendly_name="budget_table",
description="A table that expires in 1 year",
labels=[("department", "finance"), ("purpose", "backup")]
);
Here:
- CREATE TABLE owox-analytics.myDataset.sales_data_clone: Creates a new table named sales_data_clone in the myDataset dataset.
- CLONE owox-analytics.myDataset.sales_data: Indicates that the new table should clone the sale table.
- OPTIONS: Specifies additional settings for the new table.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 365 DAY): Sets the cloned table to expire in 1 year.
- friendly_name="budget_table": Provides a user-friendly name for the cloned table.
- description="A table that expires in 1 year": Provides a description of the cloned table.
- labels=[("department", "finance"), ("purpose", "backup")]: Adds labels to the table for better identification and management.
Example 2:
The following example creates the table clone employee_clone based on the table employee_records.
CREATE TABLE IF NOT EXISTS `owox-analytics.myDataset.employee_clone`
CLONE
`owox-analytics.myDataset.employee_records`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 365 DAY),
friendly_name="employee_clone_table",
description="A table that expires in 1 year",
labels=[("department", "HR"), ("purpose", "development")]
);
Here:
- CREATE TABLE IF NOT EXISTS owox-analytics.myDataset.employee_clone: Creates a new table named employee_clone in the hr_data dataset if it does not already exist.
- CLONE owox-analytics.myDataset.employee_records: Indicates that the new table should clone the employee table.
- OPTIONS: Specifies additional settings for the new table.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 365 DAY): Sets the cloned table to expire in 1 year.
- friendly_name="employee_clone_table": Provides a user-friendly name for the cloned table.
- description="A table that expires in 1 year": Provides a description of the cloned table.
- labels=[("department", "HR"), ("purpose", "development")]: Adds labels to the table for better identification and management.
CREATE VIEW Statement
The CREATE VIEW statement in BigQuery allows you to create a virtual table based on a SQL query. Views simplify complex queries by encapsulating them into a single object, making data easier to work with. They provide a way to abstract and reuse queries, improving query readability and maintainability. Views also enhance security by restricting access to specific columns or rows, thus protecting sensitive data while allowing users to perform necessary analyses.
Syntax of CREATE VIEW Statement
CREATE [ OR REPLACE ]
VIEW [ IF NOT EXISTS ] view_name
[(view_column_name_list)]
[OPTIONS(view_option_list)]
AS query_expression
view_column_name_list :=
view_column[, ...]
view_column :=
column_name [OPTIONS(view_column_option_list)]
- CREATE [ OR REPLACE ] VIEW: The command to create a new view or replace an existing view.
- IF NOT EXISTS: Optional clause to avoid errors if the view already exists.
- view_name: The name of the new view.
- view_column_name_list: Optional list of column names for the view.
- OPTIONS: Optional clause to specify view options like descriptions, labels, and expiration times.
- view_option_list: A list of options to customize the view.
- AS query_expression: The SQL query that defines the view's content.
Example of CREATE VIEW Statement
This example demonstrates how to create a temporary view in BigQuery, named sales_summary, which aggregates sales data from the orders table, equipped with an expiration setting, descriptive metadata, and labels for effective data management and organizational accessibility.
CREATE VIEW
`owox-analytics.myDataset.sales_summary`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="sales_summary",
description="A view that summarizes sales and expires in 2 days",
labels=[("department", "sales")]
)
AS SELECT transaction_id, transactionDate, saleAmount
FROM `owox-analytics.myDataset.orders`
Here:
- CREATE VIEW owox-analytics.myDataset.sales_summary: Creates a view named sales_summary in the specified project and dataset.
- OPTIONS: Specifies additional settings for the view.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR): Sets the view to expire 48 hours after its creation.
- friendly_name="sales_summary": Provides a user-friendly name for the view.
- labels=[("department", "sales")]: Adds a label to the view for better identification and management.
- AS SELECT transaction_id, transactionDate, saleAmount
- FROM owox-analytics.myDataset.orders: Defines the SQL query that the view is based on, which selects transaction IDs, transaction dates, and sale amounts from the orders table.
CREATE MATERIALIZED VIEW Statement
The CREATE MATERIALIZED VIEW statement in BigQuery allows you to create a materialized view, which is a precomputed result set stored for later use. Materialized views can significantly improve query performance by storing the results of a query so they can be quickly accessed without having to re-execute the original query.
Syntax of CREATE MATERIALIZED VIEW Statement
CREATE [ OR REPLACE ]
MATERIALIZED VIEW [ IF NOT EXISTS ]
materialized_view_name
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(materialized_view_option_list)]
AS query_expression
Here:
- CREATE [ OR REPLACE ] MATERIALIZED VIEW [ IF NOT EXISTS ]: Command to create a new materialized view, optionally replacing an existing one or avoiding errors if the view already exists.
- materialized_view_name: The name of the new materialized view.
- PARTITION BY partition_expression: Optional clause to specify the partitioning expression.
- CLUSTER BY clustering_column_list: Optional clause to specify the clustering columns.
- OPTIONS: Optional clause to specify additional options for the materialized view.
- materialized_view_option_list: A list of options to customize the materialized view.
- AS query_expression: The query that defines the materialized view.
Examples of CREATE MATERIALIZED VIEW Statement
Following are examples that demonstrate the usage of CREATE MATERIALIZED VIEW Statement in multiple ways.
Example 1:
The following example creates a new materialized view named sales_data_mv in myDataset:
CREATE MATERIALIZED VIEW `owox-analytics.myDataset.sales_data_mv`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="sales_mv",
description="A materialized view that expires in 2 days",
labels=[("department", "sales")],
enable_refresh=true,
refresh_interval_minutes=20
)
AS SELECT product_id, SUM(sales_amount) AS total_sales
FROM `owox-analytics.myDataset.sales_data`
GROUP BY product_id
Here:
- CREATE MATERIALIZED VIEW owox-analytics.myDataset.sales_data_mv: Creates a new materialized view named sales_data_mv in the myDataset dataset.
- OPTIONS: Specifies additional settings for the materialized view.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR): Sets the materialized view to expire in 2 days.
- friendly_name="sales_mv": Provides a user-friendly name for the materialized view.
- description="A materialized view that expires in 2 days": Provides a description of the materialized view.
- labels=[("department", "sales")]: Adds labels to the materialized view for better identification and management.
- enable_refresh=true: Enables automatic refresh for the materialized view.
- refresh_interval_minutes=20: Sets the refresh interval to 20 minutes.
- AS SELECT product_id, SUM(sales_amount) AS total_sales FROM owox-analytics.myDataset.sales_data GROUP BY product_id: Defines the query that the materialized view will store, which summarizes sales data by product.
Example 2:
The following example creates a materialized view named inventory_mv in myDataset only if it does not already exist.
CREATE MATERIALIZED VIEW IF NOT EXISTS `owox-analytics.myDataset.inventory_mv`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="inventory_mv",
description="A view that expires in 2 days",
labels=[("department", "inventory")],
enable_refresh=false
)
AS SELECT product_id, is_in_stock FROM `owox-analytics.myDataset.inventory`
Here:
- CREATE MATERIALIZED VIEW IF NOT EXISTS owox-analytics.myDataset.inventory_mv: Creates a new materialized view named inventory_mv in the myDataset dataset if it does not already exist.
- OPTIONS: Specifies additional settings for the materialized view.
- expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR): Sets the materialized view to expire in 2 days.
- friendly_name="inventory_mv": Provides a user-friendly name for the materialized view.
- description="A view that expires in 2 days": Provides a description of the materialized view.
- labels=[("department", "inventory")]: Adds labels to the materialized view for better identification and management.
- enable_refresh=false: Disables automatic refresh for the materialized view.
- AS SELECT product_id, is_in_stock FROM owox-analytics.myDataset.inventory: Defines the query that the materialized view will store, which selects item information from the inventory table.
Example 3:
The following example creates a materialized view named orders_mv in order_data, partitioned by the order_date column and clustered by the customer_id column.
CREATE MATERIALIZED VIEW
`owox-analytics.myDataset.orders_mv`
PARTITION BY DATE(order_date)
CLUSTER BY customer_id
AS SELECT customer_id, order_date,
COUNT(1) as order_count
FROM
`owox-analytics.myDataset.orders_partitioned_table`
GROUP BY customer_id, order_date
Here:
- CREATE MATERIALIZED VIEW owox-analytics.myDataset.orders_mv: Creates a new materialized view named orders_mv in the myDataset dataset.
- PARTITION BY DATE(order_date): Partitions the materialized view by the order_date column.
- CLUSTER BY customer_id: Clusters the materialized view by the customer_id column.
- AS SELECT customer_id, order_date, COUNT(1) as order_count FROM owox-analytics.myDataset.orders_partitioned_table GROUP BY customer_id, order_date: Defines the query that the materialized view will store, which summarizes order data by customer and order date.
CREATE MATERIALIZED VIEW AS REPLICA OF Statement
The CREATE MATERIALIZED VIEW AS REPLICA OF statement in BigQuery allows you to create a materialized view that replicates another materialized view. This is useful for scenarios where you need to create a read-only copy of a materialized view in a different location, or for disaster recovery purposes. By replicating a materialized view, you ensure consistency, providing a reliable way to access the precomputed results without directly querying the original view.
Syntax of CREATE MATERIALIZED VIEW AS REPLICA OF Statement
CREATE MATERIALIZED VIEW replica_name
[OPTIONS(materialized_view_replica_option_list)]
AS REPLICA OF source_materialized_view_name
- CREATE MATERIALIZED VIEW replica_name: Command to create a new materialized view replica.
- OPTIONS: Optional clause to specify additional options for the materialized view replica.
- materialized_view_replica_option_list: A list of options to customize the materialized view replica.
- AS REPLICA OF source_materialized_view_name: Specifies the source materialized view to replicate.
Example of CREATE MATERIALIZED VIEW AS REPLICA OF Statement
The following example creates a materialized view replica named orders_mv_replica in myDataset.
CREATE MATERIALIZED VIEW `owox-analytics.myDataset.orders_mv_replica`
OPTIONS(
replication_interval_seconds=1200
)
AS REPLICA OF
`owox-analytics.myDataset.orders_mv`
Here:
- CREATE MATERIALIZED VIEW owox-analytics.myDataset.orders_mv_replica: Creates a new materialized view named orders_mv_replica in the myDataset.
- OPTIONS: Specifies additional settings for the new materialized view.
- replication_interval_seconds=1200: Sets the replication interval to 1200 seconds (20 minutes).
- AS REPLICA OF owox-analytics.myDataset.orders_mv: Specifies that the new materialized view should replicate the orders_mv view from the myDataset.
Get BigQuery Reports in Seconds
Seamlessly generate and update reports in Google Sheets—no complex setup needed
Mastering DROP Statements to Manage BigQuery Assets
DROP statements in BigQuery provide a powerful way to remove database objects that are no longer needed, freeing up storage and simplifying your data environment. These statements can delete tables, views, functions, and other objects, ensuring your database remains clean and efficient.
DROP SNAPSHOT TABLE Statement
The DROP SNAPSHOT TABLE statement deletes a snapshot table, a static, read-only table capturing the data at a specific time. This action is irreversible, and all data within the snapshot table will be lost. It is typically used in database maintenance and management tasks to free up space or remove outdated snapshots.
Syntax of DROP SNAPSHOT TABLE Statement
DROP SNAPSHOT TABLE [IF EXISTS] [project_name.]dataset_name.]table_snapshot_name
- DROP SNAPSHOT TABLE: Command to delete a snapshot table.
- [IF EXISTS]: Optional clause to avoid errors if the table doesn't exist.
- [project_name.]dataset_name.]table_snapshot_name: The name of the snapshot table to be deleted.
Example of DROP SNAPSHOT TABLE Statement
Here's an example of using the DROP SNAPSHOT TABLE statement to delete a snapshot table named transactions_snapshot if it exists.
DROP SNAPSHOT TABLE IF EXISTS `owox-analytics.myDataset.transactions_snapshot`
Here:
- DROP SNAPSHOT TABLE: Command to delete a snapshot table.
- IF EXISTS: Ensures no error if monthly_sales_snapshot doesn't exist.
- owox-analytics.myDataset.transactions_snapshot: Name of the snapshot table to be deleted.
DROP TABLE Statement
The DROP TABLE statement deletes an existing table from the database, freeing up storage space. This action is irreversible, and all the data within the table will be lost. It is typically used in database maintenance and management tasks to remove outdated or unnecessary tables.
Syntax of DROP TABLE Statement
DROP TABLE [IF EXISTS] table_name
- DROP TABLE: Command to delete a table.
- [IF EXISTS]: Optional clause to avoid errors if the table doesn't exist.
- table_name: The name of the table to be deleted.
Example of DROP TABLE Statement
Here's an example of how to use the DROP TABLE statement to delete a table named employee_contacts_copy if it exists.
DROP TABLE IF EXISTS `owox-analytics.myDataset.employee_contacts_copy`
Here:
- DROP TABLE: Command to delete a table.
- IF EXISTS: Ensures no error if employee_contacts_copy doesn't exist.
- owox-analytics.myDataset.employee_contacts_copy: Name of the table to be deleted.
Advanced Strategies for Table Management in BigQuery
Advanced table management strategies in BigQuery involve combining DDL statements to optimize data workflows and ensure efficient schema evolution. Let's explore these techniques.
CREATE TABLE Statement of an Existing Table with INFORMATION SCHEMA
Using INFORMATION SCHEMA to generate CREATE TABLE statements for existing tables simplifies replication and helps in maintaining data integrity. This approach allows database administrators and developers to easily capture the exact structure of existing tables, including column definitions, data types, and other metadata.
Example: Creating CREATE TABLE Statement
A data engineer needs to document the schema of the employees table from the myDataset dataset in BigQuery and prepare to recreate this table in another environment or for documentation purposes.
Step 1: Extract Schema Metadata
Use the INFORMATION_SCHEMA.COLUMNS view in BigQuery to fetch metadata about each column in the product_categories table.
SELECT
table_name,
column_name,
data_type,
is_nullable,
column_default,
ordinal_position
FROM
`company_db.INFORMATION_SCHEMA.COLUMNS`
WHERE
table_name = product_categories
ORDER BY
ordinal_position;
Step 2: Manual Construction of the CREATE TABLE Statement
Based on the output from the above query, you would manually construct the CREATE TABLE statement. Suppose the query results were as follows (simplified for example purposes):
- product_id: INT64, not nullable
- product_name: STRING, nullable
- product_category_id: INT64, nullable
- Product_category_name: STRING, nullable
- department: STRING, nullable, default 'Sales'
- date: DATE, nullable
You could then manually write a CREATE TABLE statement like this:
CREATE TABLE test.product_categories (
product_id INT64 NOT NULL,
product_name STRING,
product_category_id INT64,
product_category_name STRING,
department STRING DEFAULT 'Sales',
date DATE
);
Here:
- product_id INT64 NOT NULL: An integer column that cannot be null.
- product_name STRING: A string column that can be null.
- product_category_id INT64: An integer column that can be null.
- product_category_name STRING: A string column that can be null.
- department STRING DEFAULT 'Sales': A string column that defaults to 'Sales' if no value is provided, can be null.
- date DATE: A date column that can be null.
Employing Temporary Tables for Schema Evolution
BigQuery Temporary Tables provide a lot of functionality and can be created using the TEMP or TEMPORARY parameters. These tables are useful for session-based data manipulation and analytics.
Method 1: Example of Using the TEMP Parameter
The example below demonstrates how to create a temporary table to find employees from Sales department who have sales.
CREATE TEMP TABLE sales (employee_id STRING)
AS
SELECT employee_id
FROM `owox-analytics.myDataset.employee_data`
WHERE department= 'Sales';
SELECT
employee_id
FROM sales
WHERE employee_id IN (
SELECT salesperson_id
FROM
`owox-analytics.myDataset.salesperson_data`
);
Here:
- CREATE TEMP TABLE sales (employee_id STRING): Creates a temporary table named sales with a column employee_id of type STRING.
- AS SELECT employee_id FROM owox-analytics.myDataset.employee_data` WHERE department='Sales': The employee table from Sales department.
- SELECT employee_id FROM sales WHERE employee_id IN ( SELECT salesperson_id FROM owox-analytics.myDataset.salesperson_data): Queries the sales table to find employees who have sales.
Method 2: Example of Using the TEMPORARY Parameter
The example below demonstrates how to create a global temporary table to find the top 10 expensive products, and then use this table to find products that were bought.
CREATE TEMPORARY TABLE top_products (product_id STRING)
AS
SELECT product_id
FROM `owox-analytics.myDataset.product_catalog`
ORDER BY price DESC
LIMIT 10;
SELECT
product_id
FROM top_products
WHERE product_id IN (
SELECT product_id
FROM `owox-analytics.myDataset.sales_data`
);
Here:
- CREATE TEMP TABLE top_products(product_id STRING): Creates a temporary table named top_products with a column product_id of type STRING.
- AS SELECT product_id FROM owox-analytics.myDataset.product_catalog` ORDER BY price DESC LIMIT 10: The most expensive products table with the top 10 products.
- SELECT product_id FROM top_products WHERE product_id IN ( SELECT product_id FROM owox-analytics.myDataset.sales_data): Queries the top_products table to find products that were bought.
Replacing Tables with CREATE OR REPLACE TABLE
The CREATE OR REPLACE TABLE statement in Google BigQuery allows you to update table structures seamlessly. This command is particularly useful when you need to modify a table’s schema or redefine its data without manually dropping and recreating the table.
Example of Using CREATE OR REPLACE TABLE
The following example demonstrates how to use the CREATE OR REPLACE TABLE statement to update the structure of a table named orders_history in the myDataset dataset. This example includes adding a new computed column for total sales amount.
CREATE OR REPLACE TABLE myDataset.orders_history AS
SELECT
order_id,
product_id,
quantity,
price_per_unit,
quantity * price_per_unit AS total_sales
FROM
myDataset.orders_history;
Here:
- CREATE OR REPLACE TABLE myDataset.orders_history AS: This command replaces the existing sales table in the myDataset dataset with a new definition.
- SELECT order_id, product_id, quantity, price_per_unit: Selects the order_id, product_id, quantity, and price_per_unit columns from the existing table.
- quantity * price_per_unit AS total_sales: Adds a new computed column total_sales that multiplies quantity by price_per_unit.
- FROM myDataset.orders_history: Specifies the source table from which the data is being selected and restructured.
Get BigQuery Reports in Seconds
Seamlessly generate and update reports in Google Sheets—no complex setup needed
Best Practices for Using DDL Statements in BigQuery
Data Definition Language (DDL) statements are essential for managing and defining the schema of your database tables in BigQuery. Following best practices when using DDL statements ensures that your database operations are smooth and your schema changes do not disrupt ongoing processes.
Optimize Cost with Partitioning and Clustering
Use DDL statements to implement partitioning and clustering in your tables to reduce query costs and improve performance significantly. Partitioning divides your data into smaller, more manageable pieces based on date or other logical divisions, while clustering sorts data within each partition. This optimization ensures that queries only scan relevant data, reducing the amount of processed data and associated costs.
Regularly Clean Up Unused Tables
Use DDL statements to drop tables that are no longer needed. Regularly cleaning up your data environment helps manage storage costs, reduce clutter, and maintain organizational efficiency. This practice ensures your BigQuery environment remains organized and operates optimally by preventing outdated or irrelevant data from impacting performance.
Uncover in-depth insights
Modern Data Management Guide
Download nowBonus for readers
Troubleshooting Common DDL Errors When Working with Tables
Working with DDL statements in BigQuery can sometimes result in errors that disrupt your workflow. Understanding common DDL errors and their solutions can help maintain smooth operations and prevent data management issues.
Addressing "Already Exists: project_id .table" Error
This error occurs when you attempt to create a table that already exists within the specified dataset.
⚠️ Error: "Already Exists: project_id.table"
The error message "Already Exists: project_id.table" indicates a conflict with the existing table's name, which prevents the creation of a new table with the same name.
✅ Solution:
Before executing a CREATE TABLE statement, ensure the table does not already exist or use the CREATE TABLE IF NOT EXISTS statement to avoid the error. Alternatively, if intending to replace an existing table, use the CREATE OR REPLACE TABLE statement. This approach helps manage potential naming conflicts without manual checks.
CREATE TABLE IF NOT EXISTS
mydataset.mytable (
id INT64,
name STRING
);
Handling "NOT NULL" Constraints in BigQuery Table Creation
⚠️ Error: "NULL value in column violates NOT NULL constraint"
This error can occur when attempting to create a table with columns defined as NOT NULL without proper handling of existing data constraints. The NOT NULL modifier specifies that a column must always have a value thus it cannot contain nulls.
✅ Solution:
When defining a table schema with NOT NULL constraints in BigQuery, ensure all incoming data adheres to these constraints to avoid insertion errors. Use the CREATE TABLE statement to define NOT NULL fields properly and check data consistency before loading data into the table to ensure compliance with these constraints.
If you are transforming or migrating data, include validation steps to fill or correct any null values that would violate the NOT NULL requirement.
CREATE TABLE mydataset.mytable (
id INT64 NOT NULL,
name STRING NOT NULL
);
INSERT INTO mydataset.mytable (id, name)
VALUES (1, 'Alice'), (2, 'Bob');
Addressing "Already Exists" Error When Creating Table Snapshots in BigQuery
⚠️ Error: "Already Exists: project_id.dataset_name.snapshot_name"
This error occurs when attempting to create a table snapshot with a name that already exists within the dataset. It indicates a naming conflict that prevents the new snapshot from being created because a snapshot or table with the same name is already present in the dataset.
✅ Solution:
To prevent this error, verify that the intended name for the table snapshot is not already in use within the dataset before executing the CREATE SNAPSHOT TABLE command. You can do this by checking the dataset manually in the BigQuery UI or by querying the INFORMATION_SCHEMA.TABLES view to list all tables and snapshots in the dataset.
SELECT table_name
FROM
`mydataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = 'snapshot_table';
Error Due to Missing Partition Filter in Queries on Partitioned Tables
⚠️Error: "Partition filter required but not provided"
Setting timePartitioning.requirePartitionFilter to true on a partitioned table requires a partition filter in queries. If omitted, BigQuery will reject the query, preventing unintended full table scans and excess costs.
✅ Solution:
To prevent errors when querying partitioned tables with timePartitioning.requirePartitionFilter set to true, always use partition filters. This involves including the partition column in your query’s WHERE clause to ensure the query is compliant. Educate users about the necessity of partition filters to prevent unintended full table scans and excessive costs.
SELECT * FROM
mydataset.my_partitioned_table
WHERE _PARTITIONDATE = '2023-05-01';
Learn More About Data Definition Language
In this article, we have established a foundation with our detailed overview of the CREATE and DROP statements. To further explore the modification of existing database structures within BigQuery, we invite you to dive deeper into the practical applications and nuanced capabilities with our article on ALTER commands, ensuring you can understand the vital role of refining and optimizing database schemas.
Expand Your Knowledge with Additional BigQuery Functions
Explore other advanced BigQuery functions to enhance your data analysis and query capabilities.
- String Functions: Enable text data manipulation and formatting, facilitating operations like substring extraction and pattern matching.
- Conditional Expressions: Execute different actions based on specified conditions, utilizing statements such as CASE and IF.
- Navigation Functions: Access data from subsequent or preceding rows efficiently without self-joins, with functions like LEAD and LAG.
- Numbering Functions: Assign unique numbers to rows within a result set partition, using functions like ROW_NUMBER, RANK, and DENSE_RANK.
- Conversion Functions: Convert data between various types to ensure compatibility and accurate formatting.
- Datetime Functions: Manage operations involving both dates and times with datetime data types, using functions like DATETIME_ADD, DATETIME_SUB, DATETIME_DIFF, and FORMAT_DATETIME.
Enhance Your Data Analysis with OWOX BI BigQuery Reports Extension
Boost your data analytics capabilities with the OWOX BI BigQuery Reports Extension. This essential tool seamlessly integrates BigQuery with Google Sheets, offering an easy-to-use platform for your team to pull and analyze data effortlessly.
Get BigQuery Reports in Seconds
Seamlessly generate and update reports in Google Sheets—no complex setup needed
Even non-technical users can engage with complex datasets, create automated reports, and extract actionable insights directly within Google Sheets. Simplify your approach to big data and empower your team to make informed business decisions more effectively. Start using the OWOX BI BigQuery Reports Extension today and revolutionize your data interaction.
FAQ
-
What is a DDL Statement in BigQuery?
A Data Definition Language (DDL) statement in BigQuery is used to define, modify, or delete database structures. This includes creating, altering, and dropping tables, views, and schemas, allowing for dynamic schema evolution and management within the BigQuery environment.
-
How can I create a table in BigQuery only if it doesn't already exist?
To create a table only if it doesn't already exist, use the CREATE TABLE IF NOT EXISTS statement. This ensures that the table is created only if it does not already exist in the specified dataset, preventing errors related to duplicate table creation.
-
How do I handle errors when a table or column I'm trying to create already exists?
To handle errors when a table or column already exists, use the IF NOT EXISTS clause in your DDL statements. For example, CREATE TABLE IF NOT EXISTS or ALTER TABLE ADD COLUMN IF NOT EXISTS prevents errors by checking for existence before attempting creation.
-
What best practices should I follow when using partitioning and clustering in DDL statements?
When using partitioning and clustering, ensure that you choose appropriate partition keys and clustering columns based on query patterns. Partition by date or logical divisions and cluster by columns, frequently used in filtering and sorting.
-
How can I use DDL statements to manage and optimize BigQuery table schemas over time?
Use DDL statements to incrementally evolve table schemas by adding or modifying columns without downtime. Implement partitioning and clustering to improve query performance.