Blog / User stories

One step ahead: How Harvey uses ClickHouse for proactive threat detection

author avatar
Mike Parowski
Mar 24, 2025 - 7 minutes read

The legal industry is notorious for highly manual, labor-intensive tasks: contract analysis, due diligence, litigation support, regulatory compliance. Harvey is a generative AI platform that builds custom LLMs to help law firms automate these workflows, improving efficiency and productivity while allowing lawyers to focus on more complex, high-value work.

But revolutionizing the legal profession isn’t just about streamlining processes - it’s also about protecting sensitive client information. When Mike Parowski joined Harvey as the company’s first detection and response (D&R) hire in April of 2024, he saw the need for a proactive approach to threat detection that could keep client data secure from network threats.

At a recent meetup in San Francisco, Mike broke down how Harvey’s security team is using network logs for scalable, efficient threat detection - and how ClickHouse Cloud is helping Harvey and its customers stay ahead of potential security risks.

Data overload

In the world of D&R, there’s what Mike calls a "standard paradigm," where network logs are viewed as secondary tools — used retrospectively to investigate security incidents. Usually, after an alert from an endpoint detection and response (EDR) system, security teams dig into network logs to uncover the source of the problem.

"I challenge that notion" Mike says. "Not only can network logs be effective primary indicators of compromise, but they may actually beat host EDRs to the punch."

But harnessing network logs for proactive threat detection isn’t always easy. Upon joining Harvey, Mike’s first challenge was managing its massive volume of network data. The platform, powered by Azure NSG flow logs, produces 5.4 TB of data daily, spanning 1.5 million unique IP addresses and up to 40 million rows. As Mike notes, "We expect this data to grow proportional to our customer base - so as Harvey gets more customers, this data will only go up."

Processing this volume in real time was challenging because the logs were stored in "block blob" format, which forced the team to traverse huge chunks of raw data before they could run meaningful queries. Each minute generated hundreds of thousands of log entries for just one service, resulting in a complex web of file paths that had to be navigated.

"At this volume, doing any sort of aggregate statistics and proactive detection becomes nearly impossible," Mike explains. Queries designed to identify potential threats, such as top talkers or traffic anomalies, were difficult to execute. "We just consistently met with timeouts or out-of-memory errors, no matter how hard we tried" he says.

A more scalable solution

To overcome these challenges, the Harvey AI team turned to ClickHouse Cloud and partnered with RunReveal, a security analytics platform built on ClickHouse. The combination of ClickHouse’s high-performance columnar database and RunReveal’s expertise allowed Harvey to completely restructure their approach to processing network logs.

As Mike explains, their optimization strategy involved selecting primary indexes for key data points - source IPs, destination IPs, and destination ports - and using materialized views to pre-compute aggregations. "By really understanding how we would use these logs, and by understanding the domain itself, we could carefully select the indexes we’d use in the materialized views to drive those efficiency gains," he says.

This approach has allowed the team to better track and compare network activity over time. "Combining these fields lets us effectively answer the question of who is doing what, and gives us quantifiable metrics to compare today to yesterday to the day before" Mike adds.

Along with improving query efficiency, ClickHouse's “compression capabilities (which Mike calls "absolutely nuts") played a key role. The 5.4 TB of daily log data was compressed down to just 4.55 GB, reducing the data footprint and enabling faster, real-time processing. "I don’t know if anyone else besides ClickHouse could get us those numbers" Mike says.

With this new architecture in place, Harvey’s security team can run complex queries on network traffic patterns, detect anomalies, and proactively detect threats - capabilities that would have been impossible before.

Enrichment for better insights

Beyond simply optimizing data storage and querying, the Harvey team has taken further steps to enrich their network logs with more meaningful, human-readable labels. Tapping into the flexibility of ClickHouse’s query capabilities, they’ve mapped raw IP addresses to recognizable names, making it easier to understand network activity in real time.

"I was actually surprised by how seamlessly this worked with ClickHouse," Mike says. "I’ve tried doing similar enrichment in other systems, but they couldn’t handle the scale or complexity of our data without major slowdowns."

He says the enriched data gives the team a "really clear, high-level snapshot of what our network looks like and the visibility across Harvey’s environment." By enriching the data, the team can quickly spot suspicious activity, such as unexpected traffic spikes or unusual port activity. The higher level of visibility has allowed the security team to not only monitor their network but also act on anomalies with greater confidence and precision.

This enrichment process has further simplified the task of detecting threats, helping Harvey’s security team move away from sifting through raw, indecipherable logs and focus on key patterns that point to genuine security concerns.

Real-time threat detection

In cryptography and cybersecurity, Alice and Bob are traditionally used as the names of the two parties communicating, while Eve - shorthand for "evil" or "eavesdropper" - is the third-party adversary or hacker trying to listen in or disrupt their communications.

"On the security team at Harvey, we like to say that if we do our jobs well, and if we protect this perimeter from Evil Eve, we let all the brilliant people at Harvey do their jobs, and at the end of the day, we all get to drink Mai Tais on the beach. ClickHouse helps us do just that."

Mike Parowski

With the help of ClickHouse Cloud, Harvey has successfully transformed their network threat detection. What once seemed like an overwhelming flood of data is now a manageable, streamlined system capable of running real-time detections on Harvey’s rapidly expanding network. The improvements in compression and query efficiency mean the team can stay ahead of security threats while maintaining the highest level of data protection.

The impact of these changes goes beyond simply managing data effectively. By optimizing their security operations, Harvey has ensured that their platform can scale securely, giving peace of mind to their internal teams and customers. As Harvey grows, Mike and the team are confident they have the tools to stay ahead of evolving threats, allowing them to focus on their core mission: revolutionizing the legal industry with cutting-edge AI solutions.

To learn more about ClickHouse and see how it can improve the efficiency and scalability of your team’s data operations, try ClickHouse Cloud free for 30 days.

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
X imageBluesky imageSlack image
GitHub imageTelegram imageMeetup image
Rss image