Clickhouse Meetups San Jose RunReveal
Hey everybody. I'm sure everybody in this room is no different from me in that they're very excited about the things that are happening around AI and how things are progressing. I myself, when I started to write this talk, I was thinking about all the things that I wanted to be able to do to interact with a computer. I grew up watching sci-fi. I'm a huge fan of just being able to talk to your computer and have it do things for you. Seeing how every company and every model is progressing has meant a lot to me. But still, after 20 years of using computers, working in security, breaking them, and knowing how they work under the hood, I've got a healthy distrust of computers at this point.
I work at RunReveal. I'm a co-founder and CTO there. We build software to help companies monitor their security posture — like a SIEM. We are a security data platform that helps companies ingest logs: audit logs, Google Workspace logs, CloudTrail logs, and that sort of thing. But we also run our own agents for investigations and such. So, just like every company out there, we want to know what our agents are doing. And since we are effectively an observability company, we wanted to be able to monitor them in our own way.
This talk is going to cover how we monitor those agents, share a bit about RunReveal's use case for using agents, and from a personal perspective, how I'm thinking about the evolution of agents and AGI — because it's really coming to a head fast. The root of this is: I want to trust agents, so let's talk about how we get there.
Once again, I'm Alan. I used to work at Cloudflare and then Segment, and now I'm building RunReveal. I've been a ClickHouse user for 10 years now, which is pretty insane — it went open source 10 years ago, and Cloudflare picked it up pretty quickly. At RunReveal, it was no question what database we were going to use to store logs when we got started. We just started using ClickHouse, realized the power of it, and wanted to build something that's a great product for security teams — built by security engineers for security engineers.
What is RunReveal? We are a security data platform. We collect and store logs, we make them indexed so that agents can run effectively and conduct investigations without much human input. We help responders actually respond to events by building an investigation tool that surfaces all the data you need. We have AI auto-investigators, and they help you triage an incident when you get an alert. It all works extremely well because the data has already been normalized and stored in an indexed format.
But today is not about those things in particular. Today's talk is about what happens when you want to run an agent in an environment and know what it's doing.
A little context about how we got here and how I'm thinking about this. Last year was pretty insane — I think everybody can agree. MCP is still just a little over a year old; it started in November 2024 but really took off in January 2025. MCP is pretty good — it was the first standard protocol that people used. But there wasn't a lot of thought put towards security and context management at that point. Anthropic was really just looking for a way to standardize getting context into these models, and it worked really well. Everybody hopped on it, everybody got excited. But context windows got really polluted, latency increased, and the attack surface for prompt injections just exploded overnight.
So come late 2025, things started moving towards skills and using CLIs instead of MCP. That has benefits — it's better at managing context — but drawbacks in that you now have to worry about everything that the CLI tools can do on your machine. And now that seems to be almost the standard for how to get more productive and run general agents. That's again exploding the security concerns and increasing the attack surface area.
And with that, it's got the inklings of truly general intelligence, which is why I'm excited. But on the flip side, hardly any security controls, and it's still a bit of an afterthought. I haven't seen many deployments where there's really good monitoring of what's happening with agents, so I set out to build one.
Fundamentally, anything that's more generalized is going to be at odds with security. That's the fundamental tension we're trying to solve. But the way you can solve it is by breaking things down into least-privileged agents. We can all agree that your scheduler doesn't need access to your bank accounts, and your email assistant doesn't need access to your AWS deploy keys. Decomposing these into purpose-built agents — where each only has access to what it needs — I think is the way to go. And in addition, they should all be fully auditable. Every single action those agents are taking needs to be visible so that you can have confidence they aren't transferring your entire bank balance somewhere.
So how do we get there? It's a lot of data.
First, you want to make sure you're collecting audit logs — your MCP calls, your API calls, every command they're running. These are necessary, but not sufficient. Because when you're running a security agent, you also want to know everything it's doing on partner APIs, you want to know which DNS hostnames it's resolving, and when you're giving agents a lot of agency and letting them run arbitrary commands, you really need to know at a very fine-grained level what they're doing.
What we built is a sandbox that collects every DNS query, every TCP connection, and every process execution inside a container and makes it available inside RunReveal, so you can actually see everything that's happening inside the box. We're not just logging the API calls — we're logging everything we can about what that agent is doing so that we can build fingerprints and profiles on individual agents, report on what those agents are doing, and then create detections and controls to prevent the agent from doing malicious activities.
All of this is built on top of ClickHouse, of course. It's really — as far as I know — the only database that can stand up to this sort of challenge, at least the only open source one I can think of.
The architecture: we have per-agent sandboxes on Kubernetes. Every pod gets its own DNS sidecar so we can see every DNS query it's making. We've also got eBPF kernel-level TCP connection tracing, so we can go from the DNS query to the process to the actual request being made on the network. We collect all of that through the RunReveal DaemonSet, which ships the logs to RunReveal and eventually into ClickHouse. One nice property of this is that it generates ephemeral credentials injected into the agent container at runtime, so you don't have to worry about long-lived credentials being compromised.
We're using Tetragon — a project out of Cilium — which is a great framework for doing network controls on Kubernetes and other overlay networks. We put in kprobes that hook into the Linux kernel and tell you about the syscalls you want to know about. In this case, we care about TCP Connect and UDP send message. We can also track process executions. By doing that, we're able to create a fingerprint of every process running inside these containers, analyze what behaviors are expected for a given agent, and once we have that fingerprint, we can start enforcing an allowlist of what they can connect to and what they can do.
So why ClickHouse? Agent telemetry is very high volume. ClickHouse gives you all the tools you need to collect terabytes of data and analyze it very quickly. Agents can't drink from the firehose — they have a limited context window — so you need to control which data is actually getting into the agent's context. Building something on top of ClickHouse that the agent can query and analyze asynchronously is a really good model for anything requiring a long context window. It lets you manage context so you're not polluting it.
You can also do windowing functions to detect anomalies and detect changes from time frame to time frame. And in our particular use case, because we run a multi-tenant cloud, ClickHouse has really good row-based access control policies. I gave a talk a few months ago at one of these meetups specifically about that — it's a really strong policy-based access control where you can do hierarchical roles to control exactly what a particular connection is able to do. We expose those agents to ClickHouse through our own MCP server, which enforces all of those policies. It's ClickHouse all the way down.
What can we do with this? We can catalog behavior, detect anomalies, and enforce policies. We're not quite to the policy enforcement step yet, but it's a big payoff. It allows us to run these agents in a more trusted way. Granted, they're still non-deterministic and can still do all sorts of things when you give them shell access and internet access — it can still be pretty scary. But this can really help calm the nerves. At least you can start to see what's going on and not worry as much about what the agent's doing when you have the ability to go back and look at it.
The key insight is: you can't secure what you can't see.

Scaling ClickHouse to petabytes of logs at OpenAI

How ClickHouse helps Anthropic scale observability

How Capital One cut infrastructure costs by 50%
Engineering leaders at Capital One share how they cut infrastructure costs by 50% and reduced average dashboard load time from 5+ to under 500ms with ClickHouse Cloud.