Inside Google BigQuery

By Tyler Keenan

a large brick warehouse with green shutters

As organizations rush to gather and process as much data as possible, database administrators can struggle to keep pace. Though a number of cloud-based services have sprung up to help organizations quickly scale their computing and storage resources, the challenge of managing data warehouses remains. Google’s BigQuery is a novel data warehouse system that abstracts away many of the technical issues associated with setting up and managing a data warehouse.

To understand what this could mean for your organization, we’ll take a look at some of the challenges associated with conventional data warehouses and how BigQuery solves them.

Why Do You Need Two Databases?

First off, what is a data warehouse? Unlike a transactional database, whose purpose is (unsurprisingly) to help facilitate transactions, data warehouses are meant to collect data for later analysis. This makes it a business intelligence tool more than an operational one.

These different roles also mean that transactional databases and data warehouses have different technical requirements. A transactional database is typically optimized for read-write operations from a single data source. By contrast, a data warehouse is optimized to facilitate complex queries on large data sets that may come from a variety of sources.

Setting Up a Data Warehouse

Setting up a successful data warehouse is a complex task. Since Facebook introduced Hive, running some kind of SQL-like layer of abstraction over a typical MapReduce operation has been a common way for organizations to handle big data analytics. Unfortunately, setting up, managing, and scaling a Hadoop cluster is an ongoing challenge. What’s more, MapReduce isn’t suited to real-time analysis and ad hoc querying.

It’s no surprise then that a number of cloud-based competitors have appeared to tackle these problems. Major players in this area include Amazon’s Redshift and Microsoft’s Azure SQL Data. These services are essentially virtual data warehouses–they let database admins ingest data, provision storage and computing resources, and integrate with other BI tools. As with physical data centers, however, admins occasionally have to perform maintenance and cleaning operations. To be clear, these are powerful, but complex tools. A cottage industry of startups has emerged entirely to help organizations set up and manage these kinds of services.

What Makes BigQuery Different?

A significant part of managing most data warehouses is formatting data and provisioning resources. Even cloud solutions require you to spin up (and wind down) clusters of machines for given tasks. BigQuery dispenses with both of these concepts. It’s all about ad hoc queries.

BigQuery may be the first major cloud-based data warehouse to emphasize querying over administration. Functionally, what that means is that Google handles all provisioning and maintenance operations–all you have to worry about is connecting data sources and executing queries. This can be a game-changer for certain organizations, allowing data teams (or any team that needs to run SQL queries regularly) to potentially set up and run their own super-fast analytics operations without needing a database admin.

How Does It Work?

There are two major components to BigQuery. Essentially, it combines a couple of other Google Go to the full article.

Source:: Business 2 Community

Be Sociable, Share!