Skip to main content
Skip to main content
Edit this page

Tutorials and Example Datasets

We have a lot of resources for helping you get started and learn how ClickHouse works:

In addition, the sample datasets provide a great experience on working with ClickHouse, learning important techniques and tricks, and seeing how to take advantage of the many powerful functions in ClickHouse. The sample datasets include:

PageDescription
Amazon Customer ReviewOver 150M customer reviews of Amazon products
AMPLab Big Data BenchmarkA benchmark dataset used for comparing the performance of data warehousing solutions.
Brown University BenchmarkA new analytical benchmark for machine-generated log data
Geo Data using the Cell Tower DatasetLearn how to load OpenCelliD data into ClickHouse, connect Apache Superset to ClickHouse and build a dashboard based on data
COVID-19 Open-DataCOVID-19 Open-Data is a large, open-source database of COVID-19 epidemiological data and related factors like demographics, economics, and government responses
Terabyte Click Logs from CriteoA terabyte of Click Logs from Criteo
Environmental Sensors DataOver 20 billion records of data from Sensor.Community, a contributors-driven global sensor network that creates Open Environmental Data.
GitHub Events DatasetDataset containing all events on GitHub from 2011 to Dec 6 2020, with a size of 3.1 billion records.
Writing Queries in ClickHouse using GitHub DataDataset containing all of the commits and changes for the ClickHouse repository
Laion-400M datasetDataset containing 400 million images with English image captions
New York Public Library "What's on the Menu?" DatasetDataset containing 1.3 million records of historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices.
Anonymized Web AnalyticsDataset consisting of two tables containing anonymized web analytics data with hits and visits
NOAA Global Historical Climatology Network2.5 billion rows of climate data for the last 120 yrs
New York Taxi DataData for billions of taxi and for-hire vehicle (Uber, Lyft, etc.) trips originating in New York City since 2009
NYPD Complaint DataIngest and query Tab Separated Value data in 5 steps
OnTimeDataset containing the on-time performance of airline flights
Crowdsourced air traffic data from The OpenSky Network 2020The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic.
Recipes DatasetThe RecipeNLG dataset, containing 2.2 million recipes
Reddit comments datasetDataset containing publicly available comments on Reddit from December 2005 to March 2023 with over 14B rows of data in JSON format
Analyzing Stack Overflow data with ClickHouseAnalyzing Stack Overflow data with ClickHouse
Star Schema Benchmark (SSB, 2009)The Star Schema Benchmark (SSB) data set and queries
TPC-DS (2012)The TPC-DS benchmark data set and queries.
TPC-H (1999)The TPC-H benchmark data set and queries.
Taiwan Historical Weather Datasets131 million rows of weather observation data for the last 128 yrs
The UK property prices datasetLearn how to use projections to improve the performance of queries that you run frequently using the UK property dataset, which contains data about prices paid for real-estate property in England and Wales
WikiStatExplore the WikiStat dataset containing 0.5 trillion records.
YouTube dataset of dislikesA collection is dislikes of YouTube videos.