Intro to Full-Text Search in ClickHouse
Mark Needham
ClickHouse 26.2 brings the text index to production status, and in this video we take a proper look at how it works. It's implemented as an inverted index - similar to what you'd find in Lucene - so it stores a mapping from tokens to row numbers, letting the query engine skip most of the data when searching text columns.
We run through the whole thing using a 20 million row subset of the GitHub events dataset:
- How the text index works under the hood (inverted index, tokenizers, preprocessors)
- Creating tables with and without the index and comparing insert performance and disk usage
- Writing queries using
hasToken,hasAllTokens, andhasAnyTokens - Using EXPLAIN to see granule skipping in action — from 2504 granules down to 28
- A real-world example: GitTrends, built on the full 10 billion row dataset

Open House
Scaling ClickHouse to petabytes of logs at OpenAI

Open House
How ClickHouse helps Anthropic scale observability

Open House, User stories
How Capital One cut infrastructure costs by 50%
Engineering leaders at Capital One share how they cut infrastructure costs by 50% and reduced average dashboard load time from 5+ to under 500ms with ClickHouse Cloud.