Google launches BigLake preview: Helping companies analyze data more easily
At its Cloud Data Summit, Google announced a preview of BigLake. This new data lake storage engine will help enterprises more easily analyze the data in their data warehouses and data lakes.
At its core, BigLake leverages Google's experience in running and managing its BigQuery data warehouse and extends it to a data lake on Google Cloud Storage, combining the benefits of the data lake and warehouse into a single service that abstracts the underlying storage format and system.
Notably, this data can be housed in BigQuery or on AWS S3 and Azure Data Lake Storage Gen2. With BigLake, developers will get a unified storage engine and be able to query the underlying data store from a single system without moving or duplicating data.
In today's announcement, Gerrit Kazmaier, vice president and general manager of database, data analytics and business intelligence for Google Cloud, said:
Managing data in disparate data lakes and data warehouses creates silos and increases risk and cost, especially when data needs to be moved. BigLake allows companies to unify their data warehouses and lakes to analyze data without worrying about the underlying storage format or system, which eliminates the need to duplicate or move data from the source, reducing costs and inefficiencies.
Using policy tags, BigLake allows administrators to configure their security policies at the table, row and column levels. This includes data stored in Google Cloud Storage, as well as two supported third-party systems where Google's multi-cloud analytics service, BigQuery Omni, has these security controls enabled. These security controls then also ensure that only the right data flows into tools like Spark, Presto, Trino, and TensorFlow. The service also integrates with Google's Dataplex tool to provide additional data management capabilities.
Google noted that BigLake will offer fine-grained access control and its APIs will span Google Cloud, as well as open column-oriented file formats such as Apache Parquet and open-source processing engines such as Apache Spark.
Justin Levandoski, software engineer at Google Cloud, and Gaurav Saxena, product manager, explained in today's announcement:
The amount of valuable data that enterprises need to manage and analyze is growing at an alarming rate. This data is increasingly spread across many locations, including data warehouses, data lakes and NoSQL stores. As enterprises' data becomes more complex and proliferates across disparate data environments, silos emerge, creating increased risk and cost, especially when that data needs to be moved. Our customers have made it clear; they need help!