BigQuery Substring, Left, Right, and Other String Functions

Here’s a truth about strings:

Most of the time, we do not use them as is.

We always do something with it. We ensure that business rules are being followed.

Let’s change the capitalization, or let’s always check that there no excess spaces.

When we’re dealing with thousands, or even millions of rows, we do not change them one by one.

We will always process them in one go, or at least that’s the most logical way!

I wrote this article as a collection of the most common substring and string manipulations in Google BigQuery.

I also added functions that we may have used in other databases and adjusted them for Google BigQuery.

This article is full of images, practical examples, and explanations for each function, so you get the most value out of it, whether you are a beginner in SQL, or you are transitioning into Google BigQuery.

Related: Data Types in Google BigQuery

What is a Substring?

When it comes to analyzing a text, sometimes we only want a portion of it.

For example, if we have a full name that includes the first and last names, we can get the first names only.

If we have an email address, we can get the email domain only (the text after the @ symbol).

In SQL or Data Analysis, a getting the substring is the process of extracting, or getting a part of a string.

There are many ways to get a substring, depending on what we want to get.

Do we want the first 15 characters only, or the text after the dollar sign?

We explain how to achieve all of these through examples in this article.

BigQuery Substring Function

In BigQuery, we use the function SUBSTR to get the substring.

SUBSTR (text, start_point)
SUBSTR (text, start_point, length_of_substring)

There are two ways to extract a substring as we see above.

When counting characters in substrings. the starting point is always 1.

BigQuery Substring Examples

Substring Formula #1

In the first formula, we can specify a starting point, and the substring function will get the text from that starting point all the way to end.

For example, this query tells us to get the substring from position 9 onwards.

SUBSTR('[email protected]', 9)

Result: yuichiotsuka.com

This is best used for strings which have prefixes that we want to remove.

Example: If all your products have a fixed-length code at the beginning, and you want to remove all of them, you can use this function.

Product CodeRemoving the Parent Code
JCKT–12910–BLUE12910–BLUE
JCKT–12911–RED12911–RED
JCKT–12912–WHITE12912–WHITE
JCKT–12913–PINK12913–PINK
JCKT–12913–GREEN12913–GREEN

Substring Formula #2

Alternatively, the second formula allows us to provide a length for the substring.

This query tells us to get the substring from position 9 onwards, but only for a maximum of 6 characters.

SUBSTR('[email protected]', 9, 6)

Result: yuichi

This works best for strings with fixed formats.

If we have customer orders with a 123–XXXXXX–0000 format as follows:

12345678–335329–0000
12345678–771833–0000
12345678–128293–0000

We can use this substring format to get the 6 characters, from the 9th character onwards.

BigQuery STRPOS to Find Character in String

The STRPOS function tells us which point in the string can we find a specific character (or text).

If there are multiple occurrences of the text, BigQuery will give the position (or index) of the first occurrence.

If we cannot find the text, the result is 0.

Let’s have a look at some sample texts, where we try to find the position (or index) or the period sign.

TextSTRPOS (Text, ‘.’)
Mr. Yuichi3
Hello, everyone.16 (we always include spaces)
www.yuichiotsuka.com4 (the first time a period is seen)
Hello World!0 (there is no period)

We can try the examples above using this query:

SELECT 
  STRPOS('Mr. Yuichi', '.') AS Example_1,
  STRPOS('Hello, everyone.', '.') AS Example_2,
  STRPOS('www.yuichiotsuka.com', '.') AS Example_3,
  STRPOS('Hello World!', '.') AS Example_4
;

Result: 3, 16, 4, and 0

The STRPOS is not limited to single characters. We can also use words. The result we get is the position at the start of the word.

SELECT 
  STRPOS('www.yuichiotsuka.com', 'yuichi') AS Example_5
;

Result: 5

Case-insensitive STRPOS

STRPOS is case sensitive. We will get a different result if the cases do not match. Let’s look at this example.

SELECT 
  STRPOS('www.yuichiotsuka.com', 'YUICHI') AS Example_6
;

Result: 0

To make a case-sensitive version, we add the UPPER function twice. Alternatively, we can also use LOWER, and the results are the same.

SELECT 
  STRPOS(UPPER('www.yuichiotsuka.com'), UPPER('YUICHI')) AS Example_7,
  STRPOS(LOWER('www.yuichiotsuka.com'), LOWER('YUICHI')) AS Example_8
;

Result: 5 and 5

BigQuery Substring LEFT and RIGHT

In other databases, the LEFT and RIGHT functions give us the first few characters from the left or right, respectively.

Here is a sample of getting the three (3) leftmost characters.

Now, here is a sample of getting the four (4) rightmost characters.

In BigQuery, however, the LEFT and RIGHT functions are not supported.

Do not worry! We can update our queries to work on BigQuery by making these little changes:

Original FunctionBigQuery FunctionExplanation
LEFT (text, N)SUBSTR (text, 1, N)The 1 is always fixed, and we add the original length N as the third parameter.
RIGHT (text, M)SUBSTR (text, –M, M)The second parameter is the negative of M.   The third parameter is M.

Here are some examples of getting the left and right values from my name. Feel free to try it out with your name as well.

Original FunctionBigQuery FunctionResult
LEFT (‘yuichi’, 3)SUBSTR (‘yuichi’, 1, 3)yui
RIGHT (‘yuichi’, 4)SUBSTR (‘yuichi’, –4, 4)ichi

BigQuery Substring Not Found Error

If you run a SUBSTRING function in BigQuery, you may an error that tells you something like this:

Function not found: SUBSTRING at […]

This simply means that we need to change from SUBSTRING into SUBSTR, which is the accepted function in BigQuery.

BigQuery Substring After Character

In some cases, we do not know the exact positions of the substrings.

Sometimes, we just want the email domains after the at sign (@), right?

In these cases, we can use a combination of STRPOS and SUBSTR to get the substring right after a character.

Here is the general formula:

SUBSTR(original, STRPOS(original, character)+1)

Let’s look at this example, where we consider a few other email addresses with different lengths. Note that some of these are sample email addresses only.

SELECT 
  SUBSTR('[email protected]', STRPOS('[email protected]', '@')+1) AS Example_9,
  SUBSTR('[email protected]', STRPOS('[email protected]', '@')+1) AS Example_10,
  SUBSTR('[email protected]', STRPOS('[email protected]', '@')+1) AS Example_11
;

Results: yuichiotsuka.com, mail.com, and gmail.com

Why does it work?

First, let’s look at the STRPOS function, STRPOS(original, character)+1. This finds the correct starting point of the substring, with a +1 to exclude the character.

We then use that number as our starting point in the SUBSTR function.

REVERSE in BigQuery

The REVERSE function reads the text from the end to the start.

SELECT 
  REVERSE('ABCDE') AS Example_12
;

Result: EDCBA

BigQuery Add Leading Zeroes Using LPAD

If we have some strings of numbers, and we want to add leading zeroes, we can use the BigQuery LPAD Function.

LPAD(text, maximum_length, '0')

The function automatically computes for the exact number of zeroes to add.

Let’s say we have a number, 123456, and we want to add leading zeroes to ensure we have exactly 9 characters.

SELECT 
  LPAD('123456', 9, '0') AS Example_13
;

Result: 000123456

The function adds 3 leading zeroes, just enough to make a 9-digit text.

BigQuery SPLIT

Sometimes we want to split a text with separators into multiple parts, and then work on each part individually.

For example, if we have a product code 12345678–335329–0000, we may want to split it into three parts by the hyphen. We can do this using the BigQuery SPLIT function.

SPLIT(text, separator)[SAFE_OFFSET(position)]

Warning: The position in SAFE_OFFSET is zero-based, which means that the first part corresponds to 0, second is to 1, and so on.

As an example, let’s try out this sample query:

SELECT 
  SPLIT('12345678-335329-0000', '-')[SAFE_OFFSET(0)] AS Example_14,
  SPLIT('12345678-335329-0000', '-')[SAFE_OFFSET(1)] AS Example_15,
  SPLIT('12345678-335329-0000', '-')[SAFE_OFFSET(2)] AS Example_16,
  SPLIT('12345678-335329-0000', '-')[SAFE_OFFSET(3)] AS Example_17
;

Result: 12345678, 335329, 0000, and null

Note that since there are only three parts, and since we use a zero-based index, the last example returned NULL.

Conclusion

There’s a ton of ways to process strings!

We used a visual and example-based approach to learn common BigQuery String functions.

To become better in SQL, we do not need to memorize each function.

What is more important is to know what we need and find the right solution for it.

For the full documentation of String Functions, you can visit the official Google BigQuery page.

Do you have a problem with your strings? Leave a comment down below!

4 thoughts on “BigQuery Substring, Left, Right, and Other String Functions”

  1. Have you ever considered about adding a little bit more than just your articles?

    I mean, what you say is fundamental and all. Nevertheless imagine if you added some great photos
    or videos to give your posts more, “pop”! Your content is excellent but
    with pics and videos, this site could undeniably be one of the
    best in its field. Superb blog!

Leave a Reply

Your email address will not be published. Required fields are marked *