"We aggregate the user's history in ClickHouse and use it as a data store for training and inference. Even when reading 10s of millions of rows, the performance was very nice and not the bottleneck when training new models."
- Use cases
- Machine learning and GenAI
Machine learning and GenAI
The ultimate real-time database to power Machine Learning workloads. With ClickHouse, it's easier than ever to unleash GenAI on your analytics data.
- Simplify your data stack by eliminating the need for ML-specific data stores
- Use lightning-fast aggregations for data preparation, powering model training at petabyte scale
- Execute fast and efficient vector search with linear and approximate techniques
- Plug-and-play pre-built models directly, from any provider
- Develop with the ML tools you already love through our extensive suite of integrations
Agentic Data Stack
Unlock Agent-Facing Analytics within the ClickHouse Cloud console or via the native remote MCP server, and observe your agents with Langfuse
AI Assistant
AI Agent
Remote MCP Server
Docs AI
Find out why companies are using ClickHouse to power their AI workloads.
Best-in-class ingestion rates built to handle continuous streams of data so you can rely on the most up-to-date information to fuel accurate predictions and results.
Unparalleled query performance at scale. Query billions of rows in milliseconds. Reduce iteration times and maximize efficiency with your data.
Powerful automatic scaling that's designed to handle unpredictable workloads. Focus on machine learning without worrying about your infrastructure.
Available as an in-process OLAP SQL engine for Python. Leverage the full power of ClickHouse, directly in your Python code with chDB.
Trusted by developers that work with data at scale
ClickHouse for ML & AI
ClickHouse is purpose-built to make deriving insights from complex data effortless. No matter how much data you're working with. Whether you're extracting valuable information for model training and evaluation through aggregations, running inference through our User Defined Functions, or performing vector search, ClickHouse enables you to maximize data efficiency and unlock the power of AI for any application.
ClickHouse is trusted at scale to ingest and process billions of new events per day from a wide range of sources and formats. For continuous streams of data, ClickPipes seamlessly manages your ingestion pipelines so that you don't have to.
Features like User Defined Functions, described in more depth below, can be used to invoke models at insert time. This gives you the ability to pass incoming data to a model, receive the output, and store these results along with your ingested data. All without having to spin up other processes or jobs.
Native table functions make it easy to query data wherever it lives, whether locally or in object stores such as GCS and S3, or applying transformations via services like HuggingFace.
ClickHouse User Defined Functions give you the flexibility to run Python scripts - or whichever executable language you prefer - directly in ClickHouse. These scripts can be triggered at insert or query time, making it easy to invoke pre-built models from providers like OpenAI and HuggingFace, or your own.
Our extensive suite of statistical and aggregation functions scale seamlessly over petabytes of data, providing powerful model training and evaluation resources. With support for the most granular precision data types and codecs, you don't need to worry about reducing granularity.
With ClickHouse, executing vector searches using linear or approximate techniques is effortless, with out-of-the-box support and blazing speed.
ClickHouse is trusted all over the world to power customer-facing applications, where real-time responsiveness is critical.
With ClickHouse, you have everything you need to enrich your customer experiences through machine learning workloads run on your data, all in one place.
Our vibrant and growing ecosystem of integrations makes it easy to leverage your notebooks, visualization tools, and more, directly with ClickHouse.
Create valuable experiences and insights
Whether you're building engaging personalization features, incorporating semantic search into your product, generating summarized insights from raw content automatically, or more, ClickHouse exposes the features you need to build AI-powered functionality with your data.
Unify your data stack
Eliminate the need for specialized data stores used for specific machine learning tasks, such as vector search. With ClickHouse, you can rely on one, unified, data store to power your analytics, run your machine learning workloads, and manage your ad-hoc querying, all in one place.
Manage data efficiently
ClickHouse's efficient management of resources helps maximize cost-effectiveness. Our column-oriented design delivers best-in-class compression ratios, reducing storage burden and ensuring blazing speed for even the most intensive ML workloads.
Use the tools you love
Leverage ClickHouse directly with your favorite ML tools. Our growing community of integrations includes popular machine learning frameworks, visualization tools, notebooks, and more.
Supporting references
For detailed guides about how to get started with ClickHouse for ML, follow along in our blog:
- Vector Search with ClickHouse - Part 1
- Vector Search with ClickHouse - Part 2
- Video: ClickHouse for AI - Vectors, Embedding, Semantic Search, and more - Alexey Milovidov, ClickHouse
- Video: Vector Search In ClickHouse - Dale McDiarmid
- Using Langchain with ClickHouse
- Using Deepnote with ClickHouse
- Analyzing Hugging Face datasets with ClickHouse
- Using ClickHouse UDFs to integrate with OpenAI models
- Forecasting Using ClickHouse Machine Learning Functions
- Helicone's Migration from Postgres to ClickHouse for Advanced LLM Monitoring
- ClickHouse and the Machine Learning Data Layer
- Powering Feature Stores with ClickHouse
Get started with ClickHouse Cloud for free
We'll get you started on a 30 day trial and $300 credits to spend at your own pace.