Download full logo Download logomark

Managed ClickStack: Live Demo

Previous videoTime to live (TTL) in ClickHouse

Next videoScaling ClickHouse to petabytes of logs at OpenAI

A walk through of Managed ClickStack end to end: what it is, why we built it, and how it changes the economics of observability.

In this webinar, we walk through Managed ClickStack end to end: what it is, why we built it, and how it changes the economics of observability.

You'll see a live demo of instrumenting an application, ingesting OpenTelemetry data, and exploring the ClickStack UI in minutes.

Presenters:

Alex Fedotyev — Product Manager, ClickStack team
Dale McDiarmid — PMM Lead, ClickStack
Michael Shi — Founder of HyperDX, now leads the ClickStack observability team at ClickHouse

Introduction

Alex: Welcome everybody. My name is Alex Fedotyev. I'm the product manager on the ClickStack team. Just a reminder that this session will be recorded — we'll share it with everybody afterwards.

With me I have Dale, who's leading the PMM team for ClickStack, and Michael Shi, who's the original founder of HyperDX and now runs the ClickStack observability team here at ClickHouse.

I'll be covering the details about ClickStack and Managed ClickStack — what we announced just a couple of weeks ago — and then Dale will run the demo, and we'll do Q&A at the end.

The problem with current observability

Alex: Let me start with current observability solutions. Whether you're using open source tools or buying from a vendor, there are common problems across most of them — and they all come down to problems of scale. That could be the volume of data, the type of signals (logs, traces, metrics), or the cardinality of the data.

That leads to familiar workarounds: sampling, aggressive retention limits, or roll-ups to reduce cardinality so you can afford to keep the data at all. If you don't do those things, you end up with increased cost, slower performance, and gaps in your data. And in the end, that leads to a high MTTR and blind spots when teams are doing troubleshooting and root cause analysis.

This is the core problem with modern architectures — all your microservices produce so much data that you simply can't analyse it all, so you don't have the visibility you need. We hear this from all our customers.

What is ClickStack?

Alex: ClickHouse, if you don't know it, is a columnar database — open source, high performance, designed for real-time online analytical processing (OLAP). It's battle-tested for observability at scale. For example:

Tesla ingests up to 1 billion rows per second at peak for their metrics
Netflix sends around petabytes of logs daily while serving sub-second analytical queries every minute
OpenAI also uses ClickHouse for observability as one of their core use cases.

On top of ClickHouse as a database, we introduced ClickStack last year. ClickStack is an observability platform — extremely high performance, designed to give you end-to-end analysis from frontend (end user sessions) through to tracing, logs, and infrastructure metrics. Full visibility across the board, powered by ClickHouse. It's native to OpenTelemetry, and supports trace analysis, log search, alerts, dashboards, and more.

ClickStack architecture

Alex: ClickStack is named after the stack itself — three core components:

OpenTelemetry Collector — we support native OTel ingestion. The collector is pre-configured to export data to ClickHouse with the right schemas for logs, traces, and metrics.
ClickHouse — provides fast aggregations over any cardinality of data, and up to 15x compression. Logs, traces, spans, and metrics all have repetitive structure that columnar storage compresses very effectively.
ClickStack UI — built on top of HyperDX, the company we acquired last year. From the user's perspective this sits on top, and it supports end-to-end trace views, log search, natural language queries, SQL, and Lucene.

We also offer a distribution of the OpenTelemetry Collector and some SDKs that are fine-tuned for ingesting data at scale and storing it effectively in ClickHouse. The browser SDK is a personal favourite — it goes beyond standard end-user monitoring and captures full client sessions, which is very useful for understanding the real impact of problems on your users.

Benefits of ClickStack

Alex: The key benefits:

No sampling required — because of effective columnar compression and resource efficiency, you can store everything as-is. No blind spots.
Fast queries — ClickHouse's SQL engine and aggregation capabilities let you run analytical queries across all that data effectively.
Faster MTTR — whether you're troubleshooting the last hour, the last day, or the last month, query performance holds up.

A good example is Character AI, who compared ClickStack against a popular logging solution. They stored 7x more data, got significantly faster query response times for root cause analysis, and saved 50% on cost — two times cheaper, while storing more and querying faster.

Getting started (open source)

Alex: If you're not using ClickStack yet, the easiest way to start is Docker. We have an all-in-one image that bundles the collector, ClickHouse, and the HyperDX UI together. You can be running locally and ingesting via OpenTelemetry (or Vector, or other methods) very quickly.

When you move to production you'll want to harden the config, close ports, change defaults, etc. — but for a local laptop setup, the all-in-one container is the easiest starting point.

Managed ClickStack

Alex: Now let me cover what we announced about two weeks ago: Managed ClickStack. This is ClickStack offered directly within ClickHouse Cloud, with no additional SKU or purchase required. It's available across all tiers and in beta now.

Key things you get out of the box:

Seamless authentication — SSO/SAML inherited from ClickHouse Cloud. RBAC support is actively in development.
Zero-config database connections — the UI is automatically connected to the right database with the right schemas. You can start ingesting the moment you provision.
Flexible ingestion — use OpenTelemetry, Vector, Kafka, ClickPipes, or any other method.
Enterprise features — compliance, security, backups, cost controls, upgrades, and rollbacks all come from the Cloud platform without you having to manage them.
SharedMergeTree — the aggregation engine used in ClickHouse Cloud, with dynamic compute scaling.
AI notebooks — in development; will allow you to interact with your data via prompts, seeing charts and results inline.

Storage and compute separation

Alex: Storage-compute separation is a key architectural benefit in ClickHouse Cloud.

Instead of expensive attached disks, data is stored in object storage (like S3 on AWS) — which is cheaper and effectively infinite. Combined with columnar compression (up to 15x), you can store observability data for less than $0.01 per gigabyte with no enforced retention limit. Store it as long as you want.

Because storage is decoupled, compute (the query and ingest engines) becomes an elastic unit you can scale independently. Need to ingest a peak load? Scale up compute. Need to analyse a month of data? Scale up compute for that, without touching storage.

Compute-compute separation

Alex: Compute-compute separation takes this a step further. You can define separate compute pools for different workloads:

Ingest pool — handles writing logs, metrics, and traces to storage. Steady baseline, scales up during peaks.
Hot read pool — for active troubleshooting of recent data (last 1–3 days).
Cold read pool — for historical analysis across longer time windows.

Each pool scales independently, so you get high performance without over-provisioning.

As one customer put it: "observability and analytics are really just one data problem." ClickHouse is designed exactly for that kind of workload.

Unifying observability and business data

Alex: One of the most powerful things about storing observability data in ClickHouse is that you can put your business data there too — and then correlate them.

Some customers start with ClickHouse for analytics (e.g. e-commerce data, sales pipelines) and add observability. Others start with observability and bring business data in later. Either way, once both datasets are in the same store, you can correlate:

Errors with user sessions
Latency spikes with specific business events
Outages with revenue impact

Compute-compute separation makes it easy to provision separate services for each dataset — one compute pool for business data ingest, another for observability — while still being able to query across both using SQL.

AI-powered observability

Alex: This is impossible to ignore. ClickHouse uses full SQL internally — and SQL happens to be what LLMs are best at reasoning about. A few things we're working on or have available now:

AI notebooks — interact with your data via natural language prompts, see charts and query results inline. In development.
MCP server — connect ClickHouse to any MCP-compatible AI client. Configure it for specific databases or tables, and you can chat with your observability data alongside your business data.
Bring your own LLM — connect to Claude, OpenAI, or any other provider. Use ClickHouse as the data backend for your own AI agents.

Availability and pricing

Alex: A few quick points on availability:

ClickStack requires no additional purchases or SKUs — it's included across all ClickHouse Cloud tiers.
It's currently in beta. When you log into ClickHouse Cloud, you'll see "ClickStack beta" in the left-hand navigation.
Pricing is based on storage + compute. Thanks to columnar compression and object storage, it's straightforward to reach sub-$0.02 per GB per month for observability data. The exact breakdown depends on your ingest volume and query patterns, but the savings come from both storage efficiency and elastic compute.

Demo

Dale: I'm going to show you what Managed ClickStack looks like by instrumenting a simple application. Just to reinforce the architecture before we dive in: ClickHouse provides the compute and the UI (fully integrated, passwordless, with RBAC coming soon). You run the OpenTelemetry Collector yourself, in your own infrastructure. That's the current setup.

The demo application

The demo app is a real estate property listing site built with Next.js. It has:

A Postgres backend for transactional workloads (property listings, filters, booking visits)
A ClickHouse backend for analytics (property price comparisons by area)

These two are synced using ClickPipes — any row inserted into Postgres is automatically replicated into ClickHouse. The speed difference is stark: the same analytics queries that are fast in ClickHouse are very slow in Postgres (15–30 seconds vs. sub-second). No criticism of Postgres — it's just not designed for that workload.

Instrumenting the application

Installing the packages is fairly low-lift. The main ones are:

HyperDX browser SDK (@hyperdx/browser) — a fork of the OpenTelemetry JS SDK with session capture built in. This is the significant add-on over standard OTel.
Vercel OTel SDK — used here because the demo runs on Vercel, and it adds some useful Vercel-specific instrumentation. Demonstrates that any OpenTelemetry-compatible SDK works.

On the server side (Node.js): import the SDK and register your service name — here it's called house-click.

On the browser side: initialise the SDK in a useEffect, pass options for console capture and advanced network capture, and specify the OTLP endpoint and API key. Then wrap the app body with the TelemetryProvider in layout.tsx.

Running the collector

In the ClickStack UI, the onboarding flow gives you the exact Docker command to run the collector. Key parameters:

Port 4317 for OTLP/gRPC
Port 4318 for OTLP/HTTP
Endpoint, username, and password (provided when you create the service)
An OTLP auth token (optional but recommended to secure the endpoint)

When you run the command, the collector creates the necessary ClickHouse tables and then it's ready to receive data. You can also run this in Kubernetes using the provided Helm chart.

You'll also need a few environment variables in the app to point the browser SDK and server SDK at the collector (localhost:4318), and to set the auth token.

Exploring the data

After generating some traffic (browsing properties, booking visits, loading analytics), the ClickStack UI auto-detects the OTel tables and drops you straight into the trace view.

Search options:

Lucene search — e.g. search postgres to filter spans, see full traces, query text, duration, result rows. Very fast to navigate.
SQL search with schema-on-read — toggle to SQL mode and use any ClickHouse SQL expressions. For example, extract the database name from the span name as a derived column, then filter on it. Useful for users coming from Splunk.

Dashboards:

Build charts directly from traces. Example: plot average Postgres query duration vs. average ClickHouse query duration side by side. (Duration is stored in microseconds, so divide by 1,000,000 for seconds.) The contrast is immediate — Postgres queries at 15–30 seconds, ClickHouse at effectively zero on the same chart scale.

Session replay

ClickStack includes session replay from the browser SDK. With a production dataset loaded:

Filter traces by errors
Find alarming console errors — in this case, an email validation failure
Pull up the session replay for an affected user
Watch the replay and jump to the exact moment the error occurred
See that the email validation code only accepts .com addresses — anything else fails

Quantifying business impact

Here's where ClickHouse as the unified data store pays off. Rather than just knowing "there's an error," you can quantify what it's costing.

Drop into the SQL console and:

Query the failed bookings from the observability data to get the affected property IDs
Join against the property listings table (business data, in the same ClickHouse service)
Apply a naive conversion model — 5% of bookings lead to a sale, 5% commission on the property price

The result: that single email validation bug has cost £6,000 in lost revenue today. Estimated with a SQL query, in seconds, from the same data store.

Managed ClickStack: Live Demo

A walk through of Managed ClickStack end to end: what it is, why we built it, and how it changes the economics of observability.

Recent videos

YouTube Video: CNdA0r6zbe4

How Netflix ingests 10 Million events per second with ClickHouse

Mark Needham

YouTube Video: m3z1qnuNThg

How Anthropic Uses ClickHouse for Observability at Scale

Mark Needham

YouTube Video: UxDJsLLkZhc

Quadrillion-scale observability at Tesla

Mark Needham

Follow us