Querying Google BigLake with ClickHouse

Mark Needham

BigLake is Google Cloud's managed data lake service, similar in spirit to Unity Catalog or AWS Glue. This short video walks through connecting ClickHouse to BigLake, browsing Google's public warehouse, and running queries against a 1.3-billion-row dataset.

  • Step-by-step credentials setup using a permissions script (BigLake API, GCS roles, quota project)
  • ClickHouse database config pointing at the BigLake Iceberg REST catalog
  • Explore NYC TaxiCab data: schema, row counts, and Iceberg snapshot history
  • Trade-off in action: remote queries are slow over the network, but a local MergeTree copy brings it down to seconds