Approx vector search in ClickHouse

Mark Needham

Looking to speed up your vector similarity searches in ClickHouse? This video walks through approximate vector search using HNSW indexes, which can dramatically reduce query times compared to linear scans. We'll work with a dataset of nearly a million Wikipedia entries, showing you how to set up vector similarity indexes and when to use different distance functions.

We start with a basic linear scan approach using L2 distance, then introduce vector similarity indexes that bring query times down from seconds to milliseconds. You'll see practical examples using both L2 and cosine distance functions, and learn when each one makes sense for your data. We also cover filtering strategies—pre-filter versus post-filter—and how ClickHouse's query engine decides which approach to use based on heuristics.

  • Setting up vector similarity indexes with HNSW on embedding columns
  • Comparing L2 distance (Euclidean) vs cosine distance for similarity calculations
  • Performance differences between linear scan and approximate search methods
  • Adding WHERE conditions to vector searches and understanding filter strategies
  • Controlling pre-filter vs post-filter behavior with vectorSearchFilterStrategy setting

Recent videos

YouTube Video: GwCRcRa8f3A

Open House

Open House 2026: Day 1 Keynote

The latest ClickHouse announcements, featuring real-world use cases from Shopify, Zoox, Visa, and Cisco.

YouTube Video: ZtvlCz7Ukg4

Open House

Fireside Chat: The state of data and AI with Bret Taylor (Sierra) and Aaron Katz (ClickHouse)

Aaron Katz (CEO, ClickHouse) and Bret Taylor (Co-Founder Sierra, Chairman of the Board, OpenAI) have an open conversation on the state of AI.

YouTube Video: FmS7VopaqNg

Open House, ClickHouse

How to build a great database (Alexey Milovidov)

The principles behind building a great database, and the new frontiers shaping the field.

Follow us

XBlueskySlackGithubTelegramMeetupRSS