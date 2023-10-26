Learn how to use ClickHouse for vector search, including storing embeddings and searching with distance functions like cosine similarity.

Yes, ClickHouse can perform vector search.

The main advantages of using ClickHouse for vector search compared to using more specialized vector databases include:

Using ClickHouse's filtering and full-text search capabilities to refine your dataset before performing a search.

Performing analytics on your datasets.

Running a JOIN against your existing data.

against your existing data. No need to manage yet another database and complicate your infrastructure.

Here is a quick tutorial on how to use ClickHouse for vector search.

Your data (documents, images, or structured data) must be converted to embeddings. We recommend creating embeddings using the OpenAI Embeddings API or using the open-source Python library SentenceTransformers.

You can think of an embedding as a large array of floating-point numbers that represent your data. Check out this guide from OpenAI to learn more about embeddings.

Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions:

Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like cosineDistance to take an embedding of a dog image and search for related images:

This query returns the _file names and caption of the top 10 images most likely to be related to your provided dog image.

