What is the UNNEST function in BigQuery, and how does this relate to arrays?
In this article, we explain how arrays work, and how we can use the UNNEST function to flatten arrays. After processing data in these arrays, we can then treat them like normal table records.
However, not all arrays are created the same.
We also show examples of handling multiple arrays and JSON arrays. We also perform other operations on arrays such as ARRAY_CONCAT
and ARRAY_TO_STRING
.
If we can break down an array using UNNEST, we can also combine data into an array using the ARRAY_AGG
function.
What is a BigQuery Array?
Think of an array in SQL as a special container. Like our integers, numbers, and dates, an array is a data type in BigQuery.
RELATED: Data Types in BigQuery
This container (i.e. array) has the following properties:
- The elements inside each array must all have the same data type.
If an array contains all numbers, then we can only store numbers. If it contains strings, then all elements (including numbers) will be treated as strings. - An array may have zero, one, or more elements inside. An array can be empty.
- The order matters inside an array.
- An array cannot have another array inside. No array-ception this time around.
In the BigQuery Console, we can see an array as a multi-row entry. Throughout this guide, we include actual screenshots from the BigQuery console.
Creating a Sample Query with Arrays
An array can be created using the brackets, such as [1, 2, 3]
, or ['red','yellow','blue']
.
Throughout this guide, we can add this WITH
clause (also known as CTE
) on top of all our queries. This way, we can reference the colors
table.
Here, we create a sample query using colors.
WITH colors AS ( SELECT 'Primary Colors' AS category, ['red','yellow','blue'] AS samples_array UNION ALL SELECT 'Secondary Colors', ['orange','violet','green'] UNION ALL SELECT 'Black and White', ['black','white'] UNION ALL SELECT 'The Void', NULL ) SELECT category, samples_array FROM colors ;
We have two columns: category
and samples_array
. The first column is just a normal string, but the second column is an array of strings, containing the colors.
For Primary Colors and Secondary Colors, we see that each array contains three elements each.
For Black and White, the array only contains two elements. This means that it’s okay to have different numbers of elements in each array in the same column.
For The Void, we will see an empty array, which is indicated by a grayed-out area.
Reduced Storage with Arrays
With arrays, since we do not have to repeat some information, this results in reduced storage costs for BigQuery.
We can simply ungroup the arrays at the time that we need to access the data for analysis.
What is UNNEST in BigQuery?
While arrays are compact and are good for storage, they are not directly usable in many forms of analysis and database operations.
For example, if you attempt to order a table using an array column, you will get an error as follows:
ORDER BY does not support expressions of type ARRAY<…> at …
You are also not allowed to use an array as a JOIN condition, getting an error similar to the following message:
Equality is not defined for arguments of type ARRAY<…> at …
How Do You Unnest an Array in SQL?
We need to use the BigQuery UNNEST function to flatten an array into its components. The following is a syntax to use this function:
SELECT column(s), new_column_name FROM table_name, UNNEST(array_column_name) AS new_column_name
There are two important parts in the syntax.
We need to use the UNNEST in the FROM clause, and the new column in the SELECT clause.
In this sample table, we will perform an UNNEST on the column samples_array
, and we will unnest it into a column named samples_individual
.
So the UNNEST(samples_array) will be in the FROM clause. After that, we can now add the samples_column in the SELECT clause.
We can improve our previous query and use UNNEST to convert the arrays into individual rows.
SELECT category, samples_individual FROM colors, UNNEST(samples_array) AS samples_individual ;
We get the following results, which looks more like a natural table.
Functions such as sorting and joining on the new column is now possible after using the UNNEST.
However, you will notice in the results that the empty array was not included (The Void). This is the default behavior of UNNEST, which is to remove rows with empty arrays.
Not to worry! We can adjust our query to still include these rows in the following section.
Include Rows of Empty Arrays in UNNEST
To include the rows of empty arrays, we add a LEFT JOIN
to our query. Our query is now changed as follows:
SELECT category, samples_individual FROM colors LEFT JOIN UNNEST(samples_array) AS samples_individual ;
By adding the keyword LEFT JOIN
, we get the following results.
We now include the last row, with a NULL
value for the column samples_individual.
Can We Unnest Multiple Arrays?
When we use the UNNEST function on a column in BigQuery, all the rows under that column is flattened all at once.
Currently, the UNNEST function does not accept multiple arrays as parameters. We need to merge the arrays into a single array before flattening it.
To do this, we can use other functions such as ARRAY_CONCAT
, which we also explain in the succeeding sections.
How to Unnest a JSON Array
BigQuery also supports flattening a JSON into an array using JSON_EXTRACT_ARRAY
.
As long as the string is in a valid JSON format, we can explore different sections of the JSON using valid JSONpath formats.
JSON_EXTRACT_ARRAY ( Json_string [, json_path] -- optional, defaults to $ )
We need to do a combination of UNNEST
and JSON_EXTRACT_ARRAY
to completely flatten a JSON array.
Let’s say our table, named json_table
, is as follows:
Scenario 1: Using JSON_EXTRACT_ARRAY Only
Let’s run the following query.
SELECT category, JSON_EXTRACT_ARRAY(samples_json) AS samples_array FROM json_table ;
We then get the following results.
This tells us that non-quoted strings are not read in BigQuery, resulting in an empty array. Second, quotes are preserved when reading strings from JSON, as seen in Row 1.
We also see that the JSON was read as an array. We illustrate in the scenario below, to further break down the array into individual rows.
Scenario 2: Using JSON_EXTRACT_ARRAY and UNNEST
Let’s run the following query. Notice that the JSON_EXTRACT_ARRAY
was moved from the SELECT
clause into the FROM
clause, since we used it with an UNNEST
function.
SELECT category, samples_individual FROM json_table, UNNEST(JSON_EXTRACT_ARRAY(samples_json)) AS samples_individual ;
We then get the following results.
In the results, the empty arrays were not included (non-quoted strings and empty JSON). If we add the LEFT JOIN
, we get the following results.
Now, even though the samples_individual column has a mix of characters and numbers, these are all considered as strings.
How to Check if an Array Contains an Element in BigQuery?
Sometimes, we need to check if an array contains a specific element. To do this in BigQuery, we can use the UNNEST function to inspect each row of an array.
Let’s use a table used earlier, named colors. This has an array of colors named samples_array
.
Let’s say we want to filter out the rows which have a color of either red or black. This query shows us how we can filter the results to achieve this.
SELECT category, samples_array FROM colors WHERE ( SELECT COUNT(1) FROM UNNEST(samples_array) AS x WHERE x IN ('red', 'black') ) >= 1 ;
We can see that we added a combination of UNNEST
and COUNT
to identify if the array has at least one occurrence of either a red of black color. We then added this to the WHERE
clause.
The result? We get the rows for Primary Colors (since this contains red), and then category Black and White (since this row contains black).
Using BigQuery ARRAY_AGG
Now, if we can flatten arrays using the UNNEST function in BigQuery, there must be a way to do the opposite, right?
This is where the ARRAY_AGG
function comes in handy.
With this function, we can specify groupings and combine them into arrays. The syntax for ARRAY_AGG
is as follows:
ARRAY_AGG ( [DISTINCT] column/expression [IGNORE NULLS] [ORDER BY column(s)] [LIMIT N] )
ARRAY_AGG BigQuery Example
Let’s see how we can use this through an example. Let’s use a table named colors as our initial table.
Using the BigQuery ARRAY_AGG
function, we can combine the individual colors into an array for each category.
SELECT category, ARRAY_AGG( samples_individual IGNORE NULLS ORDER BY samples_individual ASC ) AS samples_array FROM colors GROUP BY category ;
Here’s the result:
Error Message: array cannot have a null element; error in writing field
Let’s have a look at this table.
If we try to create an array for each category, this is what BigQuery will attempt to do:
- Primary Colors:
{red, yellow, blue}
- Secondary Colors:
{orange, violet, green}
- Black and White:
{black, white}
- The Void:
???
You will notice that the samples_individual
column is null for The Void.
If we are using ARRAY_AGG
on a column with at least one NULL
category, BigQuery does not handle it by default and returns an error.
Array cannot have a null element; error in writing field …
We need to specify the IGNORE NULLS
keyword, to handle categories with NULL
values in them. This will exclude these rows and return an empty array for them.
SELECT category, ARRAY_AGG( samples_individual IGNORE NULLS ORDER BY samples_individual ASC ) AS samples_array FROM colors GROUP BY category ;
With the UNNEST
and ARRAY_AGG
functions, we now have much more control over our data in array format.
Using BigQuery ARRAY_CONCAT and ARRAY_TO_STRING
There are other functions that I personally use less often, such as the ARRAY_CONCAT
and ARRAY_TO_STRING
.
If we have two or more arrays, we can use the ARRAY_CONCAT
function in BigQuery to merge these and have a single array as our output.
ARRAY_CONCAT(array_1, array_2, ...)
If we have an array, and we are interested to convert it into a string, we use the ARRAY_TO_STRING
function.
This is helpful if we want to show the elements of an array, separated by a comma.
ARRAY_TO_STRING( array, character_for_separator, character_for_nulls )
The ARRAY_TO_STRING
function accepts two additional parameters. The first one allows us to specify what kind of separator we want to use. The second one tells us what to show if there are NULL
values in the array.
Conclusion
Arrays are an effective way to reduce storage for a cloud data warehouse such as Google BigQuery.
We used the BigQuery functions UNNEST and ARRAY_AGG to flatten and build arrays, respectively.
We also worked on different styles of arrays, including JSON arrays.
It can be a little tricky when working with arrays for the first time. Knowing and differentiating from each scenario is the key to write our queries correctly.
Is there a specific query that you need to create using UNNEST or ARRAY_AGG? Let me know in the comments!
Hi Ichi, thank you for your explanation. I am looking for a query which allows me to MERGE rows into ARRAY fields.
Let’s say I want to add ‘blue’ colour to samples_array for every category that does not have ‘blue’.
Hi Victor, thanks for asking! If you want to add the ‘blue’ color to those that do not have blue, you need to do two things:
1. Identify if an array contains blue (you can use WHERE EXISTS + UNNEST)
2. If it does not exist, you can add the ‘blue’ by merging the samples_array with a new array [blue]
I hope this helps!
https://waterfallmagazine.com
You are so interesting! I do not suppose I’ve truly read something like that before.
So nice to find another person with unique thoughts on this topic.
Really.. thank you for starting this up. This web site is one
thing that is required on the internet, someone with
a little originality!
I’m not certain the place you are getting your info, however great topic.
I must spend some time learning more or figuring out more.
Thank you for excellent info I used to be on the lookout forr this info for my mission.
https://gumroad.com/mbaessaywritingservi/p/5-simple-steps-for-getting-a-analysis-paper-on-line
buy essay
R (Ladonna) https://gumroad.com/mbaessaywritingservi/p/5-simple-steps-for-getting-a-analysis-paper-on-line
There is definately a great deal to find out about this topic.
I really like all of the points you have made.
Hi there mates, its great paragraph about educationand fully defined, keep it
up all the time.
This is the best explanation and tutorial I’ve found for Arrays and the Unnest function! Well done.
Thanks! Glad you found it useful.
Hi,
If I have multiple column like “sample array” & I am using unnest for all those column, then the result is completely wrong. How could we use unness for multiple column ?
Hi Swetosree!
Unnest for multiple columns will give us all the possible combinations for each column. For example, if you have Column A with 2 elements in array, and Column B with 3 elements in the array, your UNNEST will give us 6 rows (or 2 X 3).
Hi – thanks very much for this simple guide to using arrays, array_agg etc. in Google Big Query. I think this is the first time that I read something about these functions that actually made sense (as I try to pick apart 20 years of using SQL!)
It would be incredibly helpful to me if you would write something similar about creating and destroying (?) STRUCTs – again, a new area to me, and one that I’m struggling to make work for me. I’m working with people who really love nested STRUCTs, and although I can sometimes get my code to work, I cannot honestly say that I understand it properly!
Hi Mike! I just read your message, and you can think of STRUCT as tables within a table, with multiple values that should follow a fixed set of data types. Would plan this out in the future, and I salute you for 20 years of doing SQL! I hope you have gained a better grasp of the STRUCTs in your workplace.
Comments are closed.