Data Types in Google BigQuery (Standard SQL)

What is a data type in Google BigQuery?

In this article, we discover what data types are, and how they are used in Google BigQuery. We also look at commonly used data types and show some examples.

We also encounter some data types in BigQuery that are very similar.

Should we use a FLOAT vs NUMERIC? Is there a difference if we choose one over the other?

We also touch on common topics, such a knowing the largest or smallest number we can assign, or the longest chunk of text we can store.

Dates are tricky, though we know that your date last week was also a tricky one. In this article, we refer to calendar dates and time information in SQL. We also go over these briefly.

Finally, there are scenarios when we need to convert from one data type to another. We have examples of how to use CAST and CONVERT for this purpose.

What is a Data Type in BigQuery?

BigQuery data types to choose from through the upload feature

If you are familiar with spreadsheets such as Google Sheets or Microsoft Excel, you will notice that you can specify each column, row, or cell to follow a particular format.

For example, a number can be formatted either as a two-decimal currency, or just as a plain text.

Data types are important. If we know that a collection of data refers to test scores, we know what we can do with it. We can get the highest score, lowest, average, and so on.

When we assign a data type, we know what to expect for a value, and we know the kind of data that we are dealing with.

We can have numbers, characters, dates, fractions, you name it.

Google BigQuery supports a number of data types, namely:

  • Numbers and fractions
  • Characters
  • Boolean (True or False)
  • Bytes (raw data)
  • Date and Time
  • Arrays (a group of data)
  • Struct (customized data)
  • Geography

We go over these in the following sections.

List of All Data Types in BigQuery

We have the table below to show us the different data types in BigQuery, along with their names, descriptions, and keywords.

For example, if we are dealing with the number of children, we strictly need numbers without decimal parts. In this case, we use Integers represented by the keyword INT64.

Numbers and Fractions

Integer (INT64)Whole numbers only, including negative numbers
Numeric (NUMERIC)Includes decimal numbers with up to 38 digits (including decimal values)
Float (FLOAT64)Includes decimal numbers, but with a certain loss of accuracy

Characters

String (STRING)Characters that are stored and displayed in UTF-8 encoding

Boolean

Boolean (BOOL)True or False only, and is case-insensitive

Bytes

Bytes (BYTES)Characters that may or may not be in UTF-8.

Date and Time

Date (DATE)Includes year, month, date
Time (TIME)Includes time (with up to 6 decimal places for seconds)
Datetime (DATETIME)Includes both date and time
Timestamp (TIMESTAMP)Includes date, time, and the time zone

Arrays

Array (ARRAY)A list of data with the same type.  

The order is important. An array can also be empty.  

For example, [-100, 0, 1, 5, 20]

Struct

Struct (STRUCT)A set of customized data.  

We can represent each point on the map with a pair of numbers, their latitude, and longitude.  

Since any point will always have the longitude-latitude pair, we can create customized data that has two numbers (X, Y) for this purpose.

Geography

Geography (GEOGRAPHY)Since spatial data is widely used, this data type can readily support location information.

BigQuery Float vs Numeric Data Type

What is the difference between a FLOAT and NUMERIC data type?

Numeric types can handle a higher level of precision, at the cost of extra storage. Float types save storage and querying costs, at the cost of precision.

RELATED: What you need to know about Google BigQuery Pricing

Let’s see an example using the BigQuery platform. Here, we execute the following query, and get the following results.

SELECT 
  4.1 * 100 default_float,
  CAST(4.1 AS NUMERIC) * 100 use_numeric
;

This is how it should look like in the BigQuery Console:

Results:

Looking at the two columns, we see a certain loss of precision if we use FLOAT. This does not happen all the time, and in most cases, the offset is negligible.

However, there are cases when accuracy is the top priority, e.g. dealing with financial information, or statistical analysis with very small numbers. In these cases, we need to use the NUMERIC type.

In summary,

  • Use FLOAT to save storage and query costs, with a manageable level of precision
  • Use NUMERIC for accuracy in the case of financial data, with higher storage and query costs

BigQuery String Max Length

With this, I tried an experiment.

I created sample text files and added them into a table in GBQ as a new table.

I experimented with strings of length 1, 2, 4, 8, and so on.

The maximum that I tried is 2^21, or 2,097,152 characters. That’s exactly 2 MB worth of data into a single cell in BigQuery! Insane!

But you know what’s more surprising? BigQuery still accepted these strings.

I don’t think a normal business user will need more than 2 million characters for a single string, so I stopped at that.

BigQuery accepts strings with a length of more than 2 million characters.

Warning: Querying with More Than 1024 Characters

As you work with very large strings above 1024 characters, you may start to get warnings after running a query, as shown below:

The warning can be similar to the following:

Some cell values are very long and the display is truncated to the first 1024 characters to improve browser performance. If full values are necessary, try lowering the number of rows per page before clicking "Show full values".

Since BigQuery is accessed through a browser, the platform limits the display of each data to at most 1024.

Solution: This does not affect the actual data in your database. To get the full information, export your data instead.

Querying with Large Strings

Additional Information: By the time I am querying strings with a length of 2 MB each, BigQuery creates a temporary table as part of the querying process. This can be a handling by the platform and did not affect my experience as an end user.

BigQuery Timestamp and Datetime

Do you need to store information about time, date, and time zone?

These are handled in BigQuery. We use the TIME and DATE data types to handle only the time and date components, respectively.

In most cases, we want to store both the date and time component. We can use the DATETIME data type to store information such as events. In this data type, we assume that all dates follow the same local time zone.

However, for data that is handled through various time zones, it is best to use a TIMESTAMP data type. This keeps information on the date, time, and time zone.

DATETIME Data TypeTIMESTAMP Data Type
05-31-2020 23:59:59.999999Using supported time zones
05-31-2020 23:59:59.999999 America/Los_Angeles  

Using time zone offsetting
05-31-2020 23:59:59.999999-8:00
Does not store information on time zoneStores information on time zone. If not specified, defaults to UTC.

BigQuery Cast and Convert Data Types

If we need to convert from one data type to another, we can use the CAST function.

CAST(expression AS output_data_type)

For example, if we want to form a sentence, we need to convert all data (including numbers) into a STRING data type first. Mixing different data types is not recommended and can cause unexpected behavior.

Funnily, I attempted to convert my name into an integer, which is clearly wrong.

BigQuery replied by sending me 5 error messages telling me it’s not such a good idea.

Spot the 5 Error Messages!

Handling Errors in Conversion Using the SAFE_CAST Function

Sometimes, we process conversion for an entire column. In cases where some rows can cause an error, but we still want to proceed with the conversion, we use SAFE_CAST.

SAFE_CAST(expression AS output_data_type)

If a cast is not possible, that specific data will return NULL, while the others are converted successfully.

Conclusion

Data types in BigQuery work very similarly with other applications such as Microsoft Excel and other SQL Platforms.

It is always best to know our purpose for each data, so we can select the correct data type for that purpose.

In cases where we need to convert from one data type to another, we can use the CAST and SAFE_CAST functions.

You can also look at the Google BigQuery page for their official documentation on data types.

1 thought on “Data Types in Google BigQuery (Standard SQL)”

  1. Pingback: 8 Google BigQuery Data Types: A Comprehensive Guide

Leave a Reply

Your email address will not be published.