On December 6, 2022 Roni Lazimi, a Software Engineer from Disney's Observability team within Disney+ participated in the ClickHouse Meetup at the Rokt office in NYC. The meetup presentation covered how the team uses ClickHouse to process access logs and provide analytics for their content distribution system. Moving to ClickHouse means they are able to write 3 million rows a second and read 2 billion rows a second of CDN access logs.
Using ClickHouse for Access Log Processing and Analytics
Disney's content distribution system has a client-server model where videos are served to clients through multiple cache layers to prevent server overload. The Observability team at Disney+ deals with peripheral systems that support this content distribution system. They work with access logs for video data to help identify any issues like latency. To make sense of this data, the team uses ClickHouse as their data pipeline which helps them process and analyze the logs.
Why ClickHouse Was Chosen over Other Options
Lazimi explained that ClickHouse was chosen over other options like Hadoop, Flink, and Elasticsearch due to its simplicity and single-binary setup.
“We tried using Elasticsearch because that was the stack that we had originally before we found ClickHouse,” said Lazimi. “But the cons of Elasticsearch is that it requires a lot of rebalancing. It uses a Java virtual machine, which is a whole virtualization layer that I guess you don't really need if you're trying to get the most speed. And same thing with Hadoop and Flink”.
They found one of the significant advantages of ClickHouse was its lightweight architecture.
"It was kind of hopeless with Elasticsearch. We were really not doing well with ingesting all the logs that we have because it's big data, it's all the users of Disney+ generating that data. Which is of course a really big distributed system. So we have to make an equally distributed and highly scaled database system,” said Lazimi. “Ever since we chose ClickHouse, it's been going well."
Lightweight Architecture of ClickHouse
Lazimi shared details about their current cluster, which consists of 20 nodes with 2X replication. This setup provides them with 160 terabytes of storage and 2.5 terabytes of RAM. This has allowed them to get impressive performance from the system. “It’s really been helpful for query performance. We write 3 million rows a second. Last time I checked, we read 2 billion rows a second of CDN access logs”, said Lazimi.
However, the team is planning to upgrade to a larger cluster allowing them to store a significantly larger amount of logs, improving their overall data management capabilities.
“We’re going to move to a much bigger cluster soon, which blows this one out of the water. It's going to be 32 nodes each with 154 terabytes and 112 cores. Each is going to be massive. The biggest cluster you've ever seen,” added Lazimi.
Customers can easily monitor various aspects of the system by creating pipelines through automation.This can replace Influx in some cases and supports ELT and ETL pipelines. ClickHouse was set up like an HTTP server, making it flexible and easy to use.
To make the system more flexible, Lazimi recommended using unstructured logs in formats like JSON or key equals value. This abstraction started with CDN logs and then expanded to other data sources.
ClickHouse is a preferred option for the Observability team for processing access logs and providing analytics on Disney+'s content distribution system. Its flexibility and simplicity make it a powerful tool for content distribution analytics.
More Details
- This talk was given at the ClickHouse Community Meetup in NYC on December 6, 2022
- The presentation materials are available on GitHub