BigQuery UNNEST and Working with Arrays

What is the UNNEST function in BigQuery, and how does this relate to arrays?

In this article, we explain how arrays work, and how we can use the UNNEST function to flatten arrays. After processing data in these arrays, we can then treat them like normal table records.

However, not all arrays are created the same.

We also show examples of handling multiple arrays and JSON arrays. We also perform other operations on arrays such as ARRAY_CONCAT and ARRAY_TO_STRING.

If we can break down an array using UNNEST, we can also combine data into an array using the ARRAY_AGG function.

What is a BigQuery Array?

Think of an array in SQL as a special container. Like our integers, numbers, and dates, an array is a data type in BigQuery.

RELATED: Data Types in BigQuery

This container (i.e. array) has the following properties:

  • The elements inside each array must all have the same data type.
    If an array contains all numbers, then we can only store numbers. If it contains strings, then all elements (including numbers) will be treated as strings.
  • An array may have zero, one, or more elements inside. An array can be empty.
  • The order matters inside an array.
  • An array cannot have another array inside. No array-ception this time around.

In the BigQuery Console, we can see an array as a multi-row entry. Throughout this guide, we include actual screenshots from the BigQuery console.

Creating a Sample Query with Arrays

An array can be created using the brackets, such as [1, 2, 3], or ['red','yellow','blue'].

Throughout this guide, we can add this WITH clause (also known as CTE) on top of all our queries. This way, we can reference the colors table.

Here, we create a sample query using colors.

WITH colors AS (
  SELECT 'Primary Colors' AS category, ['red','yellow','blue'] AS samples_array
  UNION ALL SELECT 'Secondary Colors', ['orange','violet','green']
  UNION ALL SELECT 'Black and White', ['black','white']
  UNION ALL SELECT 'The Void', NULL
)
SELECT
  category,
  samples_array
FROM
  colors
;
Sample table with arrays

We have two columns: category and samples_array. The first column is just a normal string, but the second column is an array of strings, containing the colors.

For Primary Colors and Secondary Colors, we see that each array contains three elements each.

For Black and White, the array only contains two elements. This means that it’s okay to have different numbers of elements in each array in the same column.

For The Void, we will see an empty array, which is indicated by a grayed-out area.

Reduced Storage with Arrays

With arrays, since we do not have to repeat some information, this results in reduced storage costs for BigQuery.

We can simply ungroup the arrays at the time that we need to access the data for analysis.

What is UNNEST in BigQuery?

While arrays are compact and are good for storage, they are not directly usable in many forms of analysis and database operations.

For example, if you attempt to order a table using an array column, you will get an error as follows:

ORDER BY does not support expressions of type ARRAY<…> at …

You are also not allowed to use an array as a JOIN condition, getting an error similar to the following message:

Equality is not defined for arguments of type ARRAY<…> at …

How Do You Unnest an Array in SQL?

We need to use the BigQuery UNNEST function to flatten an array into its components. The following is a syntax to use this function:

SELECT 
  column(s),
  new_column_name
FROM 
  table_name,
  UNNEST(array_column_name) AS new_column_name

There are two important parts in the syntax.

We need to use the UNNEST in the FROM clause, and the new column in the SELECT clause.

In this sample table, we will perform an UNNEST on the column samples_array, and we will unnest it into a column named samples_individual.

So the UNNEST(samples_array) will be in the FROM clause. After that, we can now add the samples_column in the SELECT clause.

We can improve our previous query and use UNNEST to convert the arrays into individual rows.

SELECT
  category,
  samples_individual
FROM
  colors,
  UNNEST(samples_array) AS samples_individual
;

We get the following results, which looks more like a natural table.

Functions such as sorting and joining on the new column is now possible after using the UNNEST.

However, you will notice in the results that the empty array was not included (The Void). This is the default behavior of UNNEST, which is to remove rows with empty arrays.

Not to worry! We can adjust our query to still include these rows in the following section.

Include Rows of Empty Arrays in UNNEST

To include the rows of empty arrays, we add a LEFT JOIN to our query. Our query is now changed as follows:

SELECT
  category,
  samples_individual
FROM
  colors
LEFT JOIN 
  UNNEST(samples_array) AS samples_individual
;

By adding the keyword LEFT JOIN, we get the following results.

We now include the last row, with a NULL value for the column samples_individual.

Can We Unnest Multiple Arrays?

When we use the UNNEST function on a column in BigQuery, all the rows under that column is flattened all at once.

Currently, the UNNEST function does not accept multiple arrays as parameters. We need to merge the arrays into a single array before flattening it.

To do this, we can use other functions such as ARRAY_CONCAT, which we also explain in the succeeding sections.

How to Unnest a JSON Array

BigQuery also supports flattening a JSON into an array using JSON_EXTRACT_ARRAY.

As long as the string is in a valid JSON format, we can explore different sections of the JSON using valid JSONpath formats.

JSON_EXTRACT_ARRAY (
  Json_string
  [, json_path]  -- optional, defaults to $
)

We need to do a combination of UNNEST and JSON_EXTRACT_ARRAY to completely flatten a JSON array.

Let’s say our table, named json_table, is as follows:

Scenario 1: Using JSON_EXTRACT_ARRAY Only

Let’s run the following query.

SELECT 
  category, 
  JSON_EXTRACT_ARRAY(samples_json) AS samples_array
FROM 
  json_table
;

We then get the following results.

This tells us that non-quoted strings are not read in BigQuery, resulting in an empty array. Second, quotes are preserved when reading strings from JSON, as seen in Row 1.

We also see that the JSON was read as an array. We illustrate in the scenario below, to further break down the array into individual rows.

Scenario 2: Using JSON_EXTRACT_ARRAY and UNNEST

Let’s run the following query. Notice that the JSON_EXTRACT_ARRAY was moved from the SELECT clause into the FROM clause, since we used it with an UNNEST function.

SELECT 
  category, 
  samples_individual
FROM 
  json_table,
  UNNEST(JSON_EXTRACT_ARRAY(samples_json)) AS samples_individual
;

We then get the following results.

In the results, the empty arrays were not included (non-quoted strings and empty JSON). If we add the LEFT JOIN, we get the following results.

Now, even though the samples_individual column has a mix of characters and numbers, these are all considered as strings.

How to Check if an Array Contains an Element in BigQuery?

Sometimes, we need to check if an array contains a specific element. To do this in BigQuery, we can use the UNNEST function to inspect each row of an array.

Let’s use a table used earlier, named colors. This has an array of colors named samples_array.

Let’s say we want to filter out the rows which have a color of either red or black. This query shows us how we can filter the results to achieve this.

SELECT
  category,
  samples_array
FROM
  colors
WHERE (
  SELECT COUNT(1) 
  FROM UNNEST(samples_array) AS x 
  WHERE x IN ('red', 'black')
) >= 1
;

We can see that we added a combination of UNNEST and COUNT to identify if the array has at least one occurrence of either a red of black color. We then added this to the WHERE clause.

The result? We get the rows for Primary Colors (since this contains red), and then category Black and White (since this row contains black).

Using BigQuery ARRAY_AGG

Now, if we can flatten arrays using the UNNEST function in BigQuery, there must be a way to do the opposite, right?

This is where the ARRAY_AGG function comes in handy.

With this function, we can specify groupings and combine them into arrays. The syntax for ARRAY_AGG is as follows:

ARRAY_AGG (
  [DISTINCT]
  column/expression
  [IGNORE NULLS]  
  [ORDER BY column(s)]
  [LIMIT N]
)

ARRAY_AGG BigQuery Example

Let’s see how we can use this through an example. Let’s use a table named colors as our initial table.

Using the BigQuery ARRAY_AGG function, we can combine the individual colors into an array for each category.

SELECT
  category,
  ARRAY_AGG(
    samples_individual 
    IGNORE NULLS 
    ORDER BY samples_individual ASC
  ) AS samples_array
FROM
  colors
GROUP BY
  category
;

Here’s the result:

Error Message: array cannot have a null element; error in writing field

Let’s have a look at this table.

If we try to create an array for each category, this is what BigQuery will attempt to do:

  • Primary Colors: {red, yellow, blue}
  • Secondary Colors: {orange, violet, green}
  • Black and White: {black, white}
  • The Void: ???

You will notice that the samples_individual column is null for The Void.

If we are using ARRAY_AGG on a column with at least one NULL category, BigQuery does not handle it by default and returns an error.

Array cannot have a null element; error in writing field …

We need to specify the IGNORE NULLS keyword, to handle categories with NULL values in them. This will exclude these rows and return an empty array for them.

SELECT
  category,
  ARRAY_AGG(
    samples_individual 
    IGNORE NULLS 
    ORDER BY samples_individual ASC
  ) AS samples_array
FROM
  colors
GROUP BY
  category
;

With the UNNEST and ARRAY_AGG functions, we now have much more control over our data in array format.

Using BigQuery ARRAY_CONCAT and ARRAY_TO_STRING

There are other functions that I personally use less often, such as the ARRAY_CONCAT and ARRAY_TO_STRING.

If we have two or more arrays, we can use the ARRAY_CONCAT function in BigQuery to merge these and have a single array as our output.

ARRAY_CONCAT(array_1, array_2, ...)

If we have an array, and we are interested to convert it into a string, we use the ARRAY_TO_STRING function.

This is helpful if we want to show the elements of an array, separated by a comma.

ARRAY_TO_STRING(
  array, 
  character_for_separator, 
  character_for_nulls
)

The ARRAY_TO_STRING function accepts two additional parameters. The first one allows us to specify what kind of separator we want to use. The second one tells us what to show if there are NULL values in the array.

Conclusion

Arrays are an effective way to reduce storage for a cloud data warehouse such as Google BigQuery.

We used the BigQuery functions UNNEST and ARRAY_AGG to flatten and build arrays, respectively.

We also worked on different styles of arrays, including JSON arrays.

It can be a little tricky when working with arrays for the first time. Knowing and differentiating from each scenario is the key to write our queries correctly.

13 thoughts on “BigQuery UNNEST and Working with Arrays”

    1. Hi Ichi, thank you for your explanation. I am looking for a query which allows me to MERGE rows into ARRAY fields.
      Let’s say I want to add ‘blue’ colour to samples_array for every category that does not have ‘blue’.

      1. Hi Victor, thanks for asking! If you want to add the ‘blue’ color to those that do not have blue, you need to do two things:
        1. Identify if an array contains blue (you can use WHERE EXISTS + UNNEST)
        2. If it does not exist, you can add the ‘blue’ by merging the samples_array with a new array [blue]

        I hope this helps!

  1. Hi,
    If I have multiple column like “sample array” & I am using unnest for all those column, then the result is completely wrong. How could we use unness for multiple column ?

    1. Hi Swetosree!

      Unnest for multiple columns will give us all the possible combinations for each column. For example, if you have Column A with 2 elements in array, and Column B with 3 elements in the array, your UNNEST will give us 6 rows (or 2 X 3).

  2. Hi – thanks very much for this simple guide to using arrays, array_agg etc. in Google Big Query. I think this is the first time that I read something about these functions that actually made sense (as I try to pick apart 20 years of using SQL!)
    It would be incredibly helpful to me if you would write something similar about creating and destroying (?) STRUCTs – again, a new area to me, and one that I’m struggling to make work for me. I’m working with people who really love nested STRUCTs, and although I can sometimes get my code to work, I cannot honestly say that I understand it properly!

    1. Hi Mike! I just read your message, and you can think of STRUCT as tables within a table, with multiple values that should follow a fixed set of data types. Would plan this out in the future, and I salute you for 20 years of doing SQL! I hope you have gained a better grasp of the STRUCTs in your workplace.

Leave a Reply

Your email address will not be published. Required fields are marked *