Data Definition Language (DDL) statements in BigQuery are essential for creating, deleting and modifying tables within a data warehouse environment. These statements allow users to efficiently manage database schema without affecting the data itself.
Using DDL, users can create new tables, alter existing table structures, and define table schemas, enabling precise control over how data is stored, accessed, and managed.
This article exclusively covers the CREATE and DROP DDL statements in BigQuery, detailing the processes involved in creating new tables and deleting existing ones.
If you're interested in learning how to modify existing table structures to adapt to evolving data needs, we invite you to explore our in-depth article that dives into modifying tables using ALTER statements.
Data Definition Language (DDL) statements help in defining and managing the structure of your database objects in BigQuery.
These statements allow you to create, modify, and delete schemas, tables, views, and more.
Using DDL statements efficiently can help you maintain organized, scalable, and high-performing databases. Whether you are a data engineer, SQL developer, or data analyst, mastering DDL statements is crucial for optimizing data workflows and ensuring data integrity.
CREATE statements in BigQuery’s Data Definition Language (DDL) are pivotal for establishing various database objects that form the backbone of your data architecture.
These statements allow you to create schemas, tables, views, and other essential objects, each serving a unique purpose in organizing and managing your data.
The CREATE SCHEMA statement in BigQuery is used to create a new table schema within your dataset.
A schema organizes your tables and other database objects, making your data easier to manage and access. It helps group related data, improves organization and security by setting access controls at the schema level.
💡 IMPORTANT: In this SQL statement, the term "schema" is used to describe a logical grouping of tables, views, and other resources. Within BigQuery, the equivalent concept is known as a dataset. Here, "schema" does not refer to BigQuery's table schemas.
CREATE SCHEMA [ IF NOT EXISTS ]
[project_name].dataset_name
[DEFAULT COLLATE collate_specification]
[OPTIONS(schema_option_list)]
This example showcases the creation of a new schema named mydataset in BigQuery, configured with specific geographical location, default table expiration settings, and descriptive labels to enhance data management and accessibility within the organization.
CREATE SCHEMA mydataset
OPTIONS(
location="us",
default_table_expiration_days=10,
labels=[("label1","value1"),("label2","value2")]
);
Here:
The CREATE TABLE statement in BigQuery is used to create a new table within a dataset. This statement allows you to define the table's structure by specifying column names, data types, and other properties. Creating tables is fundamental for organizing and storing your data in a structured format. Tables can be used to store raw data, processed data, and any other type of structured information.
CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ]
table_name
[(
column | constraint_definition[, ...]
)]
[DEFAULT COLLATE collate_specification]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(table_option_list)]
[AS query_statement]
column:=
column_definition
constraint_definition:=
[primary_key]
| [[CONSTRAINT constraint_name] foreign_key, ...]
primary_key :=
PRIMARY KEY (column_name[, ...]) NOT ENFORCED
foreign_key :=
FOREIGN KEY (column_name[, ...]) foreign_reference
foreign_reference :=
REFERENCES primary_key_table(column_name[, ...]) NOT ENFORCED
This example illustrates the creation of a partitioned table in a BigQuery environment, where detailed inventory data including product IDs and categories are stored with structured and descriptive metadata, employing partitioning and expiration settings to optimize data management and query performance.
CREATE TABLE myDataset.product_categories
(
product_id INT64 OPTIONS(
description="An optional INTEGER field"),
details STRUCT<
categories ARRAY OPTIONS(
description="A repeated STRING field"),
in_stock BOOL
>
)
PARTITION BY _PARTITIONDATE
OPTIONS(
expiration_timestamp=
TIMESTAMP "2025-12-31 00:00:00 UTC",
partition_expiration_days=1,
description="A table that expires at the end of 2025, with each partition living for 24 hours",
labels=[("product", "category_management")]
)
Here:
💡 Delve into the intricacies of Timestamp Functions in our comprehensive guide! This resource unpacks how to master timestamp data types for exact time calculations and formatting in BigQuery. Discover the functionalities of key functions, complete with practical examples and customizable templates to boost your command over time-related data analysis.
The CREATE TABLE LIKE statement in BigQuery is used to create a new table that has the same schema as an existing table. This statement is useful when you want to replicate the structure of a table without copying its data. It simplifies the process of creating tables with identical schemas, saving time and ensuring consistency.
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ]
table_name
LIKE [[project_name.]dataset_name.]source_table_name
...
[OPTIONS(table_option_list)]
The following example creates a new table named sales_report in business_data with the same schema as sales_data and the data from the SELECT statement:
CREATE TABLE myDataset.sales_report
LIKE myDataset.sales_data
AS SELECT * FROM `owox-analytics.myDataset.sales_data
Here:
The CREATE TABLE COPY statement in BigQuery is used to create a new table by copying the schema and data from an existing table. This is useful when you need to duplicate data for backup, testing, or analysis purposes. By copying the table, you can ensure that the new table has the same structure and data as the original, making it a straightforward and efficient way to manage and replicate data.
CREATE [ OR REPLACE ]
TABLE [ IF NOT EXISTS ] table_name
COPY source_table_name
...
[OPTIONS(table_option_list)]
This example demonstrates how to efficiently create a new table in SQL by copying the schema from an existing table, populating it with data from another table, and setting additional metadata options like expiration date, description, and labels for enhanced table management and identification.
CREATE TABLE myDataset.employee_contacts_copy
COPY `owox-analytics.myDataset.employee_contacts`
Here:
The CREATE EXTERNAL TABLE statement in BigQuery is used to create a table that references data stored outside of BigQuery, such as in Google Cloud Storage or another external data source. This is useful when you need to query data without importing it into BigQuery, saving on storage costs and allowing for more flexible data management.
CREATE [ OR REPLACE ]
EXTERNAL TABLE [ IF NOT EXISTS ] table_name
[(
column_name column_schema,
...
)]
[WITH CONNECTION connection_name]
[WITH PARTITION COLUMNS
[(
partition_column_name partition_column_type,
...
)]
]
OPTIONS (
external_table_option_list,
...
);
This example illustrates how to create or replace an external table named product_inventory within the myDataset dataset in BigQuery, connecting to an external data source and configuring it with specific options like data format, field delimiter and sets the maximum number of bad records allowed.
CREATE EXTERNAL TABLE
myDataset.product_inventory (
product_id INT64,
product_name STRING,
in_stock BOOL
)
OPTIONS(
format = "CSV",
uris = ['gs://analytics/test'],
field_delimiter = '|',
max_bad_records = 5);
Here:
The CREATE SNAPSHOT TABLE statement in BigQuery allows you to create a new table that captures the state of another table at a specific point in time. This is useful for preserving historical data, creating backups, or auditing changes. Snapshot tables provide a way to analyze data as it existed at a particular moment, without affecting the original table.
CREATE SNAPSHOT TABLE [ IF NOT EXISTS ]
table_snapshot_name
CLONE source_table_name
[FOR SYSTEM TIME AS OF time_expression]
[OPTIONS(snapshot_option_list)]
There are 2 ways to use the CREATE SNAPSHOT TABLE statement, following are the two examples demonstrating them.
Example 1:
The following example creates a table snapshot of the table owox-analytics.myDataset.transactions. The table snapshot is created in the dataset myDataset and is named transactions_snapshot:
CREATE SNAPSHOT TABLE `owox-analytics.myDataset.transactions_snapshot`
CLONE `owox-analytics.myDataset.transactions`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 72 HOUR),
friendly_name="transactions_snapshot",
description="A table snapshot that expires in 3 days",
labels=[("department", "finance"), ("purpose", "archive")]
);
Here:
Example 2 :
The following example creates a table snapshot of the table owox-analytics.myDataset.sales_data. The table snapshot is created in the dataset archive and is named sales_data_snapshot:
CREATE SNAPSHOT TABLE IF NOT EXISTS `owox-analytics.myDataset.sales_data_snapshot`
CLONE `owox-analytics.myDataset.sales_data`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 72 HOUR),
friendly_name="transactions_snapshot",
description="A table snapshot that expires in 3 days",
labels=[("department", "finance"), ("purpose", "archive")]
);
Here:
The CREATE TABLE CLONE statement in BigQuery allows you to create a new table that is a clone of an existing table, meaning the new table has the same schema and data as the original table at the time of the clone. This is useful for scenarios such as testing, backup, or creating a new environment based on existing data without duplicating the storage costs.
CREATE TABLE [ IF NOT EXISTS ]
destination_table_name
CLONE source_table_name [
FOR SYSTEM TIME AS OF time_expression]
...
[OPTIONS(table_option_list)]
Following are the examples that demonstrate how the function can be used in 2 ways.
Example 1:
The following example creates the table myDataset.sales_data_clone from the table snapshot myDataset.sales_data_snapshot.
CREATE TABLE
`owox-analytics.myDataset.sales_data_clone`
CLONE
`owox-analytics.myDataset.sales_data_snapshot`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 365 DAY),
friendly_name="budget_table",
description="A table that expires in 1 year",
labels=[("department", "finance"), ("purpose", "backup")]
);
Here:
Example 2:
The following example creates the table clone employee_clone based on the table employee_records.
CREATE TABLE IF NOT EXISTS `owox-analytics.myDataset.employee_clone`
CLONE
`owox-analytics.myDataset.employee_records`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 365 DAY),
friendly_name="employee_clone_table",
description="A table that expires in 1 year",
labels=[("department", "HR"), ("purpose", "development")]
);
Here:
The CREATE VIEW statement in BigQuery allows you to create a virtual table based on a SQL query. Views simplify complex queries by encapsulating them into a single object, making data easier to work with. They provide a way to abstract and reuse queries, improving query readability and maintainability. Views also enhance security by restricting access to specific columns or rows, thus protecting sensitive data while allowing users to perform necessary analyses.
CREATE [ OR REPLACE ]
VIEW [ IF NOT EXISTS ] view_name
[(view_column_name_list)]
[OPTIONS(view_option_list)]
AS query_expression
view_column_name_list :=
view_column[, ...]
view_column :=
column_name [OPTIONS(view_column_option_list)]
This example demonstrates how to create a temporary view in BigQuery, named sales_summary, which aggregates sales data from the orders table, equipped with an expiration setting, descriptive metadata, and labels for effective data management and organizational accessibility.
CREATE VIEW
`owox-analytics.myDataset.sales_summary`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="sales_summary",
description="A view that summarizes sales and expires in 2 days",
labels=[("department", "sales")]
)
AS SELECT transaction_id, transactionDate, saleAmount
FROM `owox-analytics.myDataset.orders`
Here:
The CREATE MATERIALIZED VIEW statement in BigQuery allows you to create a materialized view, which is a precomputed result set stored for later use. Materialized views can significantly improve query performance by storing the results of a query so they can be quickly accessed without having to re-execute the original query.
CREATE [ OR REPLACE ]
MATERIALIZED VIEW [ IF NOT EXISTS ]
materialized_view_name
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(materialized_view_option_list)]
AS query_expression
Here:
Following are examples that demonstrate the usage of CREATE MATERIALIZED VIEW Statement in multiple ways.
Example 1:
The following example creates a new materialized view named sales_data_mv in myDataset:
CREATE MATERIALIZED VIEW `owox-analytics.myDataset.sales_data_mv`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(
CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="sales_mv",
description="A materialized view that expires in 2 days",
labels=[("department", "sales")],
enable_refresh=true,
refresh_interval_minutes=20
)
AS SELECT product_id, SUM(sales_amount) AS total_sales
FROM `owox-analytics.myDataset.sales_data`
GROUP BY product_id
Here:
Example 2:
The following example creates a materialized view named inventory_mv in myDataset only if it does not already exist.
CREATE MATERIALIZED VIEW IF NOT EXISTS `owox-analytics.myDataset.inventory_mv`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="inventory_mv",
description="A view that expires in 2 days",
labels=[("department", "inventory")],
enable_refresh=false
)
AS SELECT product_id, is_in_stock FROM `owox-analytics.myDataset.inventory`
Here:
Example 3:
The following example creates a materialized view named orders_mv in order_data, partitioned by the order_date column and clustered by the customer_id column.
CREATE MATERIALIZED VIEW
`owox-analytics.myDataset.orders_mv`
PARTITION BY DATE(order_date)
CLUSTER BY customer_id
AS SELECT customer_id, order_date,
COUNT(1) as order_count
FROM
`owox-analytics.myDataset.orders_partitioned_table`
GROUP BY customer_id, order_date
Here:
The CREATE MATERIALIZED VIEW AS REPLICA OF statement in BigQuery allows you to create a materialized view that replicates another materialized view. This is useful for scenarios where you need to create a read-only copy of a materialized view in a different location, or for disaster recovery purposes. By replicating a materialized view, you ensure consistency, providing a reliable way to access the precomputed results without directly querying the original view.
CREATE MATERIALIZED VIEW replica_name
[OPTIONS(materialized_view_replica_option_list)]
AS REPLICA OF source_materialized_view_name
The following example creates a materialized view replica named orders_mv_replica in myDataset.
CREATE MATERIALIZED VIEW `owox-analytics.myDataset.orders_mv_replica`
OPTIONS(
replication_interval_seconds=1200
)
AS REPLICA OF
`owox-analytics.myDataset.orders_mv`
Here:
DROP statements in BigQuery provide a powerful way to remove database objects that are no longer needed, freeing up storage and simplifying your data environment. These statements can delete tables, views, functions, and other objects, ensuring your database remains clean and efficient.
The DROP SNAPSHOT TABLE statement deletes a snapshot table, a static, read-only table capturing the data at a specific time. This action is irreversible, and all data within the snapshot table will be lost. It is typically used in database maintenance and management tasks to free up space or remove outdated snapshots.
DROP SNAPSHOT TABLE [IF EXISTS] [project_name.]dataset_name.]table_snapshot_name
Here's an example of using the DROP SNAPSHOT TABLE statement to delete a snapshot table named transactions_snapshot if it exists.
DROP SNAPSHOT TABLE IF EXISTS `owox-analytics.myDataset.transactions_snapshot`
Here:
The DROP TABLE statement deletes an existing table from the database, freeing up storage space. This action is irreversible, and all the data within the table will be lost. It is typically used in database maintenance and management tasks to remove outdated or unnecessary tables.
DROP TABLE [IF EXISTS] table_name
Here's an example of how to use the DROP TABLE statement to delete a table named employee_contacts_copy if it exists.
DROP TABLE IF EXISTS `owox-analytics.myDataset.employee_contacts_copy`
Here:
Advanced table management strategies in BigQuery involve combining DDL statements to optimize data workflows and ensure efficient schema evolution. Let's explore these techniques.
Using INFORMATION SCHEMA to generate CREATE TABLE statements for existing tables simplifies replication and helps in maintaining data integrity. This approach allows database administrators and developers to easily capture the exact structure of existing tables, including column definitions, data types, and other metadata.
Example: Creating CREATE TABLE Statement
A data engineer needs to document the schema of the employees table from the myDataset dataset in BigQuery and prepare to recreate this table in another environment or for documentation purposes.
Use the INFORMATION_SCHEMA.COLUMNS view in BigQuery to fetch metadata about each column in the product_categories table.
SELECT
table_name,
column_name,
data_type,
is_nullable,
column_default,
ordinal_position
FROM
`company_db.INFORMATION_SCHEMA.COLUMNS`
WHERE
table_name = product_categories
ORDER BY
ordinal_position;
Based on the output from the above query, you would manually construct the CREATE TABLE statement. Suppose the query results were as follows (simplified for example purposes):
You could then manually write a CREATE TABLE statement like this:
CREATE TABLE test.product_categories (
product_id INT64 NOT NULL,
product_name STRING,
product_category_id INT64,
product_category_name STRING,
department STRING DEFAULT 'Sales',
date DATE
);
Here:
BigQuery Temporary Tables provide a lot of functionality and can be created using the TEMP or TEMPORARY parameters. These tables are useful for session-based data manipulation and analytics.
Method 1: Example of Using the TEMP Parameter
The example below demonstrates how to create a temporary table to find employees from Sales department who have sales.
CREATE TEMP TABLE sales (employee_id STRING)
AS
SELECT employee_id
FROM `owox-analytics.myDataset.employee_data`
WHERE department= 'Sales';
SELECT
employee_id
FROM sales
WHERE employee_id IN (
SELECT salesperson_id
FROM
`owox-analytics.myDataset.salesperson_data`
);
Here:
Method 2: Example of Using the TEMPORARY Parameter
The example below demonstrates how to create a global temporary table to find the top 10 expensive products, and then use this table to find products that were bought.
CREATE TEMPORARY TABLE top_products (product_id STRING)
AS
SELECT product_id
FROM `owox-analytics.myDataset.product_catalog`
ORDER BY price DESC
LIMIT 10;
SELECT
product_id
FROM top_products
WHERE product_id IN (
SELECT product_id
FROM `owox-analytics.myDataset.sales_data`
);
Here:
The CREATE OR REPLACE TABLE statement in Google BigQuery allows you to update table structures seamlessly. This command is particularly useful when you need to modify a table’s schema or redefine its data without manually dropping and recreating the table.
Example of Using CREATE OR REPLACE TABLE
The following example demonstrates how to use the CREATE OR REPLACE TABLE statement to update the structure of a table named orders_history in the myDataset dataset. This example includes adding a new computed column for total sales amount.
CREATE OR REPLACE TABLE myDataset.orders_history AS
SELECT
order_id,
product_id,
quantity,
price_per_unit,
quantity * price_per_unit AS total_sales
FROM
myDataset.orders_history;
Here:
Data Definition Language (DDL) statements are essential for managing and defining the schema of your database tables in BigQuery. Following best practices when using DDL statements ensures that your database operations are smooth and your schema changes do not disrupt ongoing processes.
Use DDL statements to implement partitioning and clustering in your tables to reduce query costs and improve performance significantly. Partitioning divides your data into smaller, more manageable pieces based on date or other logical divisions, while clustering sorts data within each partition. This optimization ensures that queries only scan relevant data, reducing the amount of processed data and associated costs.
Use DDL statements to drop tables that are no longer needed. Regularly cleaning up your data environment helps manage storage costs, reduce clutter, and maintain organizational efficiency. This practice ensures your BigQuery environment remains organized and operates optimally by preventing outdated or irrelevant data from impacting performance.
Working with DDL statements in BigQuery can sometimes result in errors that disrupt your workflow. Understanding common DDL errors and their solutions can help maintain smooth operations and prevent data management issues.
This error occurs when you attempt to create a table that already exists within the specified dataset.
⚠️ Error: "Already Exists: project_id.table"
The error message "Already Exists: project_id.table" indicates a conflict with the existing table's name, which prevents the creation of a new table with the same name.
✅ Solution:
Before executing a CREATE TABLE statement, ensure the table does not already exist or use the CREATE TABLE IF NOT EXISTS statement to avoid the error. Alternatively, if intending to replace an existing table, use the CREATE OR REPLACE TABLE statement. This approach helps manage potential naming conflicts without manual checks.
CREATE TABLE IF NOT EXISTS
mydataset.mytable (
id INT64,
name STRING
);
⚠️ Error: "NULL value in column violates NOT NULL constraint"
This error can occur when attempting to create a table with columns defined as NOT NULL without proper handling of existing data constraints. The NOT NULL modifier specifies that a column must always have a value thus it cannot contain nulls.
✅ Solution:
When defining a table schema with NOT NULL constraints in BigQuery, ensure all incoming data adheres to these constraints to avoid insertion errors. Use the CREATE TABLE statement to define NOT NULL fields properly and check data consistency before loading data into the table to ensure compliance with these constraints.
If you are transforming or migrating data, include validation steps to fill or correct any null values that would violate the NOT NULL requirement.
CREATE TABLE mydataset.mytable (
id INT64 NOT NULL,
name STRING NOT NULL
);
INSERT INTO mydataset.mytable (id, name)
VALUES (1, 'Alice'), (2, 'Bob');
⚠️ Error: "Already Exists: project_id.dataset_name.snapshot_name"
This error occurs when attempting to create a table snapshot with a name that already exists within the dataset. It indicates a naming conflict that prevents the new snapshot from being created because a snapshot or table with the same name is already present in the dataset.
✅ Solution:
To prevent this error, verify that the intended name for the table snapshot is not already in use within the dataset before executing the CREATE SNAPSHOT TABLE command. You can do this by checking the dataset manually in the BigQuery UI or by querying the INFORMATION_SCHEMA.TABLES view to list all tables and snapshots in the dataset.
SELECT table_name
FROM
`mydataset.INFORMATION_SCHEMA.TABLES`
WHERE table_name = 'snapshot_table';
⚠️Error: "Partition filter required but not provided"
Setting timePartitioning.requirePartitionFilter to true on a partitioned table requires a partition filter in queries. If omitted, BigQuery will reject the query, preventing unintended full table scans and excess costs.
✅ Solution:
To prevent errors when querying partitioned tables with timePartitioning.requirePartitionFilter set to true, always use partition filters. This involves including the partition column in your query’s WHERE clause to ensure the query is compliant. Educate users about the necessity of partition filters to prevent unintended full table scans and excessive costs.
SELECT * FROM
mydataset.my_partitioned_table
WHERE _PARTITIONDATE = '2023-05-01';
In this article, we have established a foundation with our detailed overview of the CREATE and DROP statements. To further explore the modification of existing database structures within BigQuery, we invite you to dive deeper into the practical applications and nuanced capabilities with our article on ALTER commands, ensuring you can understand the vital role of refining and optimizing database schemas.
Explore other advanced BigQuery functions to enhance your data analysis and query capabilities.
Boost your data analytics capabilities with the OWOX BI BigQuery Reports Extension. This essential tool seamlessly integrates BigQuery with Google Sheets, offering an easy-to-use platform for your team to pull and analyze data effortlessly.
Even non-technical users can engage with complex datasets, create automated reports, and extract actionable insights directly within Google Sheets. Simplify your approach to big data and empower your team to make informed business decisions more effectively. Start using the OWOX BI BigQuery Reports Extension today and revolutionize your data interaction.
A Data Definition Language (DDL) statement in BigQuery is used to define, modify, or delete database structures. This includes creating, altering, and dropping tables, views, and schemas, allowing for dynamic schema evolution and management within the BigQuery environment.
To create a table only if it doesn't already exist, use the CREATE TABLE IF NOT EXISTS statement. This ensures that the table is created only if it does not already exist in the specified dataset, preventing errors related to duplicate table creation.
To handle errors when a table or column already exists, use the IF NOT EXISTS clause in your DDL statements. For example, CREATE TABLE IF NOT EXISTS or ALTER TABLE ADD COLUMN IF NOT EXISTS prevents errors by checking for existence before attempting creation.
When using partitioning and clustering, ensure that you choose appropriate partition keys and clustering columns based on query patterns. Partition by date or logical divisions and cluster by columns, frequently used in filtering and sorting.
Use DDL statements to incrementally evolve table schemas by adding or modifying columns without downtime. Implement partitioning and clustering to improve query performance.