Approx vector search in ClickHouse
Mark Needham
Looking to speed up your vector similarity searches in ClickHouse? This video walks through approximate vector search using HNSW indexes, which can dramatically reduce query times compared to linear scans. We'll work with a dataset of nearly a million Wikipedia entries, showing you how to set up vector similarity indexes and when to use different distance functions.
We start with a basic linear scan approach using L2 distance, then introduce vector similarity indexes that bring query times down from seconds to milliseconds. You'll see practical examples using both L2 and cosine distance functions, and learn when each one makes sense for your data. We also cover filtering strategies—pre-filter versus post-filter—and how ClickHouse's query engine decides which approach to use based on heuristics.
- Setting up vector similarity indexes with HNSW on embedding columns
- Comparing L2 distance (Euclidean) vs cosine distance for similarity calculations
- Performance differences between linear scan and approximate search methods
- Adding WHERE conditions to vector searches and understanding filter strategies
- Controlling pre-filter vs post-filter behavior with vectorSearchFilterStrategy setting
Recent videos
View all Videos
Open House
Open House 2026: Day 1 Keynote
The latest ClickHouse announcements, featuring real-world use cases from Shopify, Zoox, Visa, and Cisco.

Open House
Fireside Chat: The state of data and AI with Bret Taylor (Sierra) and Aaron Katz (ClickHouse)
Aaron Katz (CEO, ClickHouse) and Bret Taylor (Co-Founder Sierra, Chairman of the Board, OpenAI) have an open conversation on the state of AI.

Open House, ClickHouse
How to build a great database (Alexey Milovidov)
The principles behind building a great database, and the new frontiers shaping the field.

Open House
Fireside Chat: Ecosystem and technology trends (Vercel, dbt Labs, CoreWeave)
Aaron Katz (CEO, ClickHouse), Guillermo Rauch (CEO, Vercel), Tristan Handy (CEO, dbt Labs), and Lukas Biewald (SVP of AI, CoreWeave) discuss how AI is changing the data landscape.