Google BigQuery data structure: How to get started with cloud storage
Anna Soloninko, Digital Analyst at OWOX BI
Liubov Zhovtonizhko, Creative Writer @ OWOX
Google BigQuery is a cloud storage service that allows you to collect all your data in one system and easily analyze it using SQL queries. For data to be convenient to work with, it should be structured correctly. In this article, we’ll explain how to create tables and datasets for uploading to Google BigQuery.
Google BigQuery is a versatile cloud data warehouse that excels in storing and analyzing large datasets. It combines the functionalities of a database and a spreadsheet, enabling complex analytical queries beyond typical spreadsheet capabilities.
BigQuery simplifies setup as a cloud service, allowing users to create projects, datasets, and tables effortlessly. BigQuery supports a range of data formats for import, including CSV, JSON, and Google Sheets. Its robust querying capabilities, using standard SQL dialect, position it as the preferred solution for businesses in search of effective cloud-based big data analysis and management.
Note: This post was originally published in May 2019 and was completely updated in January 2024 for accuracy and comprehensiveness on Google Cloud and BigQuery.
Why should you use Google BigQuery?
✅ Serverless Big Data Storage: Enables on-demand data export and analysis without managing servers.
✅ Handling Massive Datasets: Efficiently processes data at a petabyte scale, suitable for large datasets.
✅ Real-time Analytics: Provides near-instant data insights through high-speed streaming and BigQuery API.
✅ ETL Data Integration: Supports data import from various sources, compatible with external ETL tools.
✅ Flexible Pricing Model: Charges based on storage and query processing, with complementary data visualization.
✅ Robust Data Security: Ensures data protection with advanced encryption and detailed access controls.
✅ Standard SQL Compatibility: Allows familiar SQL, reducing the need for extensive code rewriting.
✅ Built-in AI and ML Capabilities: Facilitates machine learning and AI projects directly within the platform.
✅ Foundation for Business Intelligence: Ideal for comprehensive data tasks, including integration, analysis, visualization, and reporting.
✅ Access to Public Datasets: Provides a vast array of over 100 public datasets for extensive analysis.
✅ Free Sandbox Account: Offers a no-cost tier for experimenting and exploring BigQuery's features.
Uncover in-depth insights
How to control and optimize costs for Google BigQuery
Download nowBonus for readers
How to set up BiqQuery
To use Google BigQuery, you need to create a project in Google Cloud Platform (GCP). Upon registration, you’ll receive access to all Cloud Platform products during a free trial period and $300 to spend on these products within the next 12 months.
After creating a project in the Google Cloud Platform, enable the BigQuery API to start using the service. BigQuery integrates seamlessly with Google Cloud Storage, providing a place to store your raw data before importing it into BigQuery.
Start by setting up a project in Google Cloud Platform (GCP). This will be your workspace for using BigQuery. Then, you need to add at least one dataset to Google BigQuery.
Datasets: What they are and how to create one
A dataset is a top-level container that’s used to organize and control access to your data. In simple terms, it’s a kind of folder in which your information is stored in the form of tables and views.
Open your project in GCP, go to the BigQuery tab, and click Create Dataset by clicking the three dots near your project name:
In the window that opens, specify a name for the dataset and the shelf life of a table. If you want tables with data to be deleted automatically, specify when exactly. Or leave the default Perpetual option so that tables can only be deleted manually.
The Processing site field is optional. By default, it’s set to US multi-region. You can find more information about regions for storing data in the help section.
How to add a table to load data into Google BigQuery
After creating a dataset, you need to add a table to which data will be collected. A table is a set of rows. Each row consists of columns, which are also called fields. There are several ways to create a table in BigQuery, depending on the data source:
Manually create an empty table and set up a data schema for it
Create a table using the result of a previously calculated SQL query
Upload a file from your computer (in CSV, AVRO, JSON, Parquet, ORC, or Google Sheets format)
Instead of downloading or streaming data, you can create a table that refers to an external source: Cloud Bigtable, Cloud Storage, or Google Drive.
In this article, we’ll take a closer look at the first method: creating a table in a big query manually.
Step 1. Select the dataset to which you want to add the table, then click Create Table by clicking three dots near your data set name:
Step 2. In the Source field, select Empty Table, and in the Table Type field, select Table in the native format of the target object. Come up with a name for the table.
Important: The names of datasets, tables, and fields must be in Latin characters and contain only letters, numbers, and underscores.
Step 3. Specify the table schema. The schema consists of four components: two mandatory (column name and data type) and two optional (column mode and description). Properly selected types and field modes will facilitate work with the data.
Big Query schema Example:
Column names
In the column name, you need to specify the parameter for which each column is responsible: date, user_id, products, etc. Titles can contain only Latin letters, numbers, and underscores (maximum 128 characters). Identical field names are not allowed, even if their case is different.
Data type
When creating a table in BigQuery, you can use the following field types:
Data type | Meaning |
Integer | A whole number. For example: 7, 1515236, 100 |
Float | A number with decimal places. For example 1,25 |
Boolean | The field accepts only TRUE or FALSE value |
String | A value consisting of characters. For example: product, 1323456 |
Bytes | Binary data |
Date | Logical calendar date. For example: 20190302 |
Datetime | Year, month, day, hour, minute, seconds. For example: 20190302 17:05:45 |
Time | Time that does not depend on a specific date. For example: 17:05:45 |
Timestamp | A date with time with microsecond precision. For example: 20190302 17:05:45 |
Record | A container of ordered nested fields |
Numeric | Precise numeric values with fractional components. For example: 99999.9999999 |
Geography | A point on the surface of the earth |
Modes
BigQuery supports the following modes for table columns:
Mode | Meaning |
Nullable | A column is nullable (by default) |
Required | NULL values are not allowed |
Repeated | A column contains an array of values of the specified type |
Note: It isn’t necessary to fill in the Mode field. If no mode is specified, the default column is NULLABLE.
Column descriptions
If you wish, you can add a short description (no more than 1024 characters) for each column in the table in order to explain what a particular parameter means.
When you create an empty table in BigQuery, you need to set the big query schema manually. This can be done in two ways:
1. Click the Add field button and fill in each column's name, type, and mode.
2. Enter the table schema as a JSON array using the Edit as text switch.
In addition, Google BigQuery can use automatic schema detection when loading data from CSV and JSON files.
This option works on the following principle: BigQuery selects a random file from the source you specify, scans up to 100 rows of data in it, and uses the results as a representative sample. It then checks each field in the uploaded file and tries to assign a data type to it based on the values in the sample.
When loading Google files, BigQuery can change the name of a column to make it compatible with its own SQL syntax. Therefore, we recommend that you upload tables with English field names; if the names are in Russian, for instance, the system will rename them automatically. For example:
If, when loading data, the names of the columns were entered incorrectly or you want to change the names and types of the columns in an existing table, you can do this manually. We’ll tell you how.
How to make changes to the table schema
After loading data into Google BigQuery, the table layout may be slightly different from the original. For example, a field name may have changed because of a character that’s not supported in BigQuery, or the field type may be INTEGER instead of STRING. In this case, you can manually adjust the schema in Bigquery.
How to change a column name
Using a SQL query, select all the columns in the table and specify a new name for the column that you want to rename. In this case, you can overwrite the existing table or create a new one. Request examples:
#legacySQL
Select
date,
order_id,
order___________ as order_type, -- new field name
product_id
from [project_name:dataset_name.owoxbi_sessions_20190314]
#standardSQL
Select
* EXCEPT (orotp, ddat),
orotp as order_id,
ddat as date
from `project_name.dataset_name.owoxbi_sessions_20190314`
How to make changes to the data type in the schema
Using an SQL query, select all data from a table and convert the corresponding column to a different data type. You can use query results to overwrite an existing table or create a new one. Request example:
#standardSQL
Select
CAST (order_id as STRING) as order_id,
CAST (date as TIMESTAMP) as date
from `project_name.dataset_name.owoxbi_sessions_20190314`
How to change the column mode
You can change the column mode from REQUIRED to NULLABLE as described in the help documentation. The second option is to export the data to Cloud Storage and, from there, return it to BigQuery with the correct mode for all columns.
How to remove a column from the data schema
Use the SELECT * EXCEPT query to exclude a column (or columns), then write the query results to the old table or create a new one. Request example:
#standardSQL
Select
* EXCEPT (order_id)
from `project_name.dataset_name.owoxbi_sessions_20190314`
In addition, there’s a second way to change the schema that’s suitable for all tasks described above: export data and load it into a new table. To rename a column, you can upload data from BigQuery to Cloud Storage, then export it from Cloud Storage to BigQuery in a new table or overwrite the data in the old table using the Advanced Parameters.
You can read about other ways to change the table structure in the Google Cloud Platform help documentation.
Export and import data from/to Google BigQuery
You can download data from and upload data to BigQuery without the help of developers via the interface or a special OWOX BI BigQuery Reports Extension. Let’s consider each method in detail.
Get BigQuery Reports in Seconds
Seamlessly generate and update reports in Google Sheets—no complex setup needed
Import data via the Google BigQuery interface
To upload the necessary information to storage – for example, data about users and offline orders – open your dataset, click Create Table, and select the data source: Cloud Storage, your computer, Google Drive, or Cloud Bigtable. Specify the path to the file, its format, and the name of the table where the data will be loaded:
After you click Create Table, a table will appear in your dataset.
Export data via the Google BigQuery interface
It’s also possible to upload processed data from BigQuery – for example, to create a report through the system interface. To do this, open the desired table with data and click the Export button.
The system will offer two options: view data in Google Data Studio or upload it to Google Cloud Storage. If you select the first option, you’ll immediately go to Data Studio, where you can save the report.
Choosing to export to Google Cloud Storage will open a new window. In it, you need to specify where to save the data and in what format.
Export and import data using an OWOX BI BigQuery Reports Extension
The Free OWOX BI BigQuery Reports extension allows you to quickly and conveniently transfer data directly from Google BigQuery to Google Sheets and vice versa. Thus, you don’t need to prepare CSV files or use paid third-party services.
Simplify BigQuery Reporting in Sheets
Easily analyze corporate data directly into Google Sheets. Query, run, and automatically update reports aligned with your business needs
For example, say you want to upload offline order data to BigQuery to build a ROPO report. For this, you’ll need to:
Install Free BigQuery Reports Extension in your browser.
Create a new file in Google Sheets, and in the tab Extensions, select OWOX BI BigQuery Reports → Upload data to BigQuery.
In the window that opens, select your project and dataset in BigQuery and enter the desired name for the table. Also, select the fields whose values you want to load. By default, the type of all fields is STRING, but we recommend choosing the data type according to the context (for example, for fields with numeric identifiers, choose INTEGER; for prices, FLOAT):
Click the Start Upload button and your data will be loaded into Google BigQuery.
You can also use this add-on to export data from BigQuery to Google Sheets – for example, to visualize data or share it with colleagues who don’t have access to BigQuery. For this:
Open Google Sheets. In the Extensions tab, select OWOX BI BigQuery Reports → Add a new report:
Then, enter your project in Google BigQuery and select Add new query.
In the new window, insert your SQL query. This could be a query that uploads data from a table to BigQuery or a query that pulls and calculates the necessary data.
Rename the query to make it easy to find and launch it by clicking the Save & Run button.
To upload data from BigQuery to Google Sheets on a regular basis, you can enable scheduled data updates:
In the Extensions tab, select OWOX BI BigQuery Reports → Schedule report.
In the window that opens, set the time and frequency for report updates and click Save:
Why collect data in Google BigQuery?
If you haven’t yet appreciated the benefits of Google BigQuery cloud storage, we recommend trying it. With the help of OWOX BI, you can combine data from your website, advertising sources, and internal CRM systems into BigQuery, so you can:
Build End-to-End Analytics: Set up full-funnel analytics and find out the real return on your marketing inv, taking into account orders, returns, and all user funnel steps on the way from interest to purchase;
Comprehensive Reporting: Create reports on complete unsampled data with any parameters and indicators;
Cohort Analysis for Customer Acquisition: Evaluate customer acquisition channels using cohort analysis;
Linking Online Advertising to Offline Sales: If you are working in both retail and e-commerce niches, you can find out how your online advertising impacts offline sales;
Optimizing Advertising Spend: Reduce the advertising costs waste, extend the life cycle of customers, and increase the LTV of your customer base as a whole;
Customer Segmentation and Personalization: Segment customers depending on their activity and personalize communication with them.
Make your data work for you with OWOX. Dive into the simplicity of BigQuery, no coding is needed. Organize all your data in one place, making insights clearer and decisions smarter. Start your journey towards hassle-free data management with our FREE trial today!
Close the Loop between Marketing & Revenue
Merge advertising, web analytics, and internal data in one report for a comprehensive overview of your performance
4.9
FAQ
-
How to use Google BigQuery?
Google BigQuery serves as a cloud-centric platform for conducting big data analytics. To use it, start by setting up a Google Cloud project and enable the BigQuery API. Create datasets and tables in BigQuery, then import your data from Google Cloud Storage or other sources. Use standard SQL queries to analyze and manipulate your data, leveraging BigQuery's powerful data processing capabilities. -
How to create a table in a big query?
To create a table in BigQuery:
- Open the BigQuery web UI in the Google Cloud Console.
- n the Explorer panel, select your project and dataset.
- Click "+ Create Table".
- Specify the table's source, schema, and destination.
- Configure table settings (e.g., expiration, encryption).
- Click "Create table".
-
How does Google BigQuery handle cloud storage?
Google BigQuery manages cloud storage by offering a fully managed, serverless data warehouse that enables scalable storage and analysis of large datasets. It integrates seamlessly with Google Cloud Storage, allowing users to import and export vast amounts of data efficiently. BigQuery's architecture supports high-speed querying and real-time analytics, making it ideal for handling big data in the cloud. -
How to edit Schema in Bigquery?
To edit a schema in BigQuery, use the Cloud Console, the bq command-line tool, or the BigQuery API. Add new fields, change field modes, or use a SQL query with ALTER TABLE. Direct editing of existing fields isn't possible; instead, create a new table with the desired schema. -
What is the importance of having a well-defined schema in BigQuery?
- A well-defined schema in BigQuery is important because it defines the structure and organization of your data. It allows BigQuery to optimize query processing, improve overall query performance, and ensure accurate and consistent results. Additionally, a schema helps in maintaining data integrity and enables efficient data exploration and analysis. -
How can I create a schema in BigQuery?
- In BigQuery, schemas are automatically created when you load data into a table using one of the supported file formats (CSV, JSON, Avro, etc.). During the loading process, BigQuery infers the schema based on the data's structure and assigns appropriate field types. Alternatively, you can also create a schema manually by specifying the field names and data types when creating a new table. -
Can I modify or update an existing schema in BigQuery?
- Yes, you can modify an existing schema in BigQuery, but with some limitations. You can add new fields to a table's schema using the 'ALTER TABLE' statement or the BigQuery web UI. However, you cannot directly modify or change the data type or order of existing fields. To modify existing fields, you would need to create a new table with the desired schema, load data into it, and then delete the old table if necessary. It's recommended to plan and design your schema carefully from the beginning to avoid significant schema changes later on.