On January 18th, 2023 Microsoft hosted a ClickHouse community meetup at their office in Redmond. The WebXT team presented two of their analytics products using ClickHouse: Microsoft Clarity and Titan. Microsoft Clarity is a free tool that provides website owners with insights to help them make better business decisions, without the need for a data science team or an instrumentation pipeline.
Narendra Rana, Principal Data Scientist on the Clarity Insights team at Microsoft, explained the architecture of Clarity, the challenges faced by the team when integrating ClickHouse, and how they overcame these challenges to create an efficient, robust, and GDPR-compliant analytics platform.
Powering Website Analytics at Petabyte-Scale
The presentation included a live demo, where the audience was shown how a webmaster could select a session and see the interaction of their end-users. Microsoft Clarity provides heat maps that help to aggregate and show the interaction of the end-users. Rana explained that creating a heatmap is a complex operation that involves a big query, which is well suited for ClickHouse. “A simple operation, like looking at a heat map is basically a big query. And as this is happening lots of data is being ingested at the same time. We looked at a lot of the options out there…we realized that yes, ClickHouse is the stuff that we are going to bet on”, said Rana.
Heat Maps in Clarity, a complex operation involving a big query in ClickHouse.
ClickHouse Backup and Recovery on Azure
The team worked with ClickHouse creator Alexey Milovidov to overcome challenges, primarily with backup and recovery on Azure, which is where they invested a significant amount of time. Rana shared that the team's experience in building this system which can be leveraged by other teams, both internal and external, who want to get efficient, robust, and consistent backups without any downtime.
Millions of rows of data are being ingested, while thousands of queries are being executed, at the same time.
Incremental Snapshots is a feature of Azure that enables incremental backups while data ingestion is still occurring. This is a more cost-effective solution than the original snapshots that Azure offered. Freeze and Sync commands are used to synchronize cache and disk, and automated validation is used to test the backup and restore processes periodically.
Satish Manivannan, Senior Director of Data and Analytics at Microsoft, expressed his enthusiasm for ClickHouse, stating, "The key point is, we really love ClickHouse and we hope to have more continued collaboration for years to come." Overall, the meetup was an excellent opportunity to learn about Microsoft's use of ClickHouse and its benefits for website analytics.