Let’s discover how you can use Google BigQuery as a platform for data analysis.
We look at pricing, the use of API, and some samples to get us started working on Google BigQuery.
You can use BigQuery to analyze public and private datasets.
I have also used the platform in my past projects. It’s powerful, easy to learn, and most of all, it works really well for remote setups and freelance projects.
With the very rich public datasets available on Google BigQuery, you’re also sure to have a lot of data to work on as you familiarize yourself with this platform.
I myself still have more to learn on the architecture-side, but we will touch on areas to help a data analyst to run successful analyses in Google BigQuery.
Is Google BigQuery a Database?
From their website (https://cloud.google.com/bigquery), Google tells us that GBQ is a serverless, highly scalable, and cost-effective option for a Cloud Data Warehouse.
Yes, Google BigQuery is a database, and has similarities and differences with other databases such as Oracle, Microsoft SQL Server, MySQL, and PostgreSQL, to name a few.
A serverless setup means that we don’t have to worry about installation, maintenance, and server troubleshooting. Google does that for us.
In some cases, this can limit us on the level of control we have with our databases (hello, control-freaks like me). However, for the most part, this means that we can dive right into analysis, allowing us to get faster results with less turnaround time.
The main advantage of cloud systems is their ability to scale. We see this in practice with cloud servers, where people only need to make a few clicks (or none at all!) to increase a server’s capacity during a Black Friday Sale. With on-premise setups however, this can be a nightmare.
GBQ is cost-effective, and I write some general considerations below to manage and control our expenses.
More importantly, you can access the database using your web browser, such as Chrome. This is what I like the most, no extra installations necessary!
Great, right? Well, there’s always a con to every pro.
What is NOT Google BigQuery?
Before you throw all your data into BigQuery, you might need to ask yourself a few other questions.
Most likely, this is NOT going to be your transactional database. GBQ has an optimized and scalable storage, but it’s meant to be used for analytics and data science use cases.
Thus, if you need to do a lot of inserts, updates. or DML statements in general (Data Manipulation Language), this might not be for you. A lot of transactions can result in higher overall costs.
For bulk inserts, you can consider streaming data, which can be useful for event tracking scenarios. This has a separate pricing, and GBQ can be a viable option.
BigQuery Pricing: Is Google BigQuery Free?
Google BigQuery has a free tier, and this allocation is refreshed monthly.
If you decide to pay for the services, Google uses a metered pricing, only charging you depending on the volume of data you store, and the processing power for your queries.
The free monthly allocation is also given to paying customers.
For small to medium-sized projects, Google BigQuery is absolutely free.
Google has an extensive pricing page (https://cloud.google.com/bigquery/pricing). Google BigQuery Pricing comes in two major categories: Storage Costs and Querying Costs.
It costs $0.02 per month to store 1 GB of data in BigQuery. That’s two cents for each gigabyte of Active Storage used.
Moreover, the first 10 GB for each table is free. Therefore, you’re free to work on smaller datasets in BigQuery.
Once a table is not updated in 90 days, it is now considered Long-term Storage, and is charged only half, at $0.01 per month.
Storage in BigQuery is at $0.02 per GB, for anything over 10 GB per table. If a table is not updated in 90 days, the rate drops to half for that table.
What this means, is that it’s always a good idea to insert data in bulk, and just let it sit for some time. Table partitioning is the key here, so we don’t need to update our data for the last month, or for a certain department.
We will default to an on-demand pricing of $5 per TB of bytes read, and the first 1 TB per month is free.
More importantly, this is where best practices come into play. We can no longer use a select-star. Pulling in more data will cost you more, and we do not want to get information on 50 columns if we are only after 1 or 2 columns.
Here are some queries, and how much they can potentially consume in bytes read.
If you are not going to query from a table, it’s going to be 0 bytes, no matter how possibly complex it can be. Below is a sample query with a UNION ALL, and some artificial tables.
Notice how the lower right tells us 0 B.
Now, if we need to get actual data, it will still depend on how much data we need to process.
Here’s a sample blockchain table with 540,584 rows of data. We tested out some queries and collected the corresponding costs.
Did you notice how costs can be greatly reduced if we only retrieve what we need?
This costing design by BigQuery is actually a good way to improve how you write queries.
How Do I Access Google BigQuery?
Here are the steps to start working in GBQ.
- Login using your Google account.
2. Go to the Google Cloud Platform page (https://console.cloud.google.com/)
3. If this is your first time, depending on your region, you will be asked to agree with some Terms of Service.
4. After accepting the terms, you can go to the BigQuery console. There are three ways to do this.
4.a. In the panel on your left-side, you can scroll through BIG DATA > BigQuery
4.b. You can also access the Search bar, and look for BigQuery
4.c. Or you can simply access the URL https://console.cloud.google.com/bigquery
5. If this is your first time, you will be asked to select a project. Click on the CREATE PROJECT button, to get started on a project.
6. Name your project. In my case, I used Ichi’s First Project.
7. You will now be taken to your project workspace.
8. At this point, you have the freedom to work on public datasets, add your own private datasets, or both. Yes!
Google BigQuery API
BigQuery also allows operations to be performed through an API call.
You can find the official documentation for Google BigQuery API at https://cloud.google.com/bigquery/docs/reference/rest.
You may be learning about Google BigQuery out of curiosity.
Or maybe, your company decided to focus on this as a new technology.
Or you simply want to broaden your skills. Data literacy is becoming more important.
I first started working with Google BigQuery from a project that started out with MySQL.
As a data analyst, you will need to learn the business and work with technology.
However, you’re not going to have a hard time learning this platform. The documentation for Google BigQuery is very thorough. It can be daunting at times, but it has a lot of examples, and I believe we learn best by doing.
BigQuery has a lot of data, and that just blows my mind!
Democratizing data analysis is always a good thing. Take a look at the public dataset bigquery-public-data, that alone holds information on financial, COVID-19, marketing, and sciences data, among others.
I hope you learned something. If you want to read more topics in detail, feel free to send me a message at firstname.lastname@example.org. I appreciate it!