Skip to main content
Skip to main content
Edit this page

AI-powered SQL generation

Starting from ClickHouse 25.7, ClickHouse Client and clickhouse-local include AI-powered functionality that converts natural language descriptions into SQL queries. This feature allows users to describe their data requirements in plain text, which the system then translates into corresponding SQL statements.

This capability is particularly useful for users who may not be familiar with complex SQL syntax or need to quickly generate queries for exploratory data analysis. The feature works with standard ClickHouse tables and supports common query patterns including filtering, aggregation, and joins.

It does this with help from the following in-built tools/functions:

  • list_databases - List all available databases in the ClickHouse instance
  • list_tables_in_database - List all tables in a specific database
  • get_schema_for_table - Get the CREATE TABLE statement (schema) for a specific table

Prerequisites

We'll need to add an Anthropic or OpenAI key as an environment variable:

Alternatively, you can provide a configuration file.

Connecting to the ClickHouse SQL playground

We're going to explore this feature using the ClickHouse SQL playground.

We can connect to the ClickHouse SQL playground using the following command:

Note

We'll assume you have ClickHouse installed, but if not, refer to the installation guide

Asking ClickHouse questions in natural language

Now it's time to start asking some questions!

The text to SQL feature is effectively a one-shot query generation tool. Since it doesn't maintain conversation history, include as much useful context as possible in your question. Be specific about:

Time periods or date ranges The type of analysis you want (averages, totals, rankings, etc.) Any filtering criteria

Finding expensive housing markets

Let's start by asking a question about house prices. The SQL playground contains a UK house prices dataset, which the AI will automatically discover:

Once we press enter, we'll see the thought process of the AI as it tries to answer our question.

The AI follows these steps:

  1. Schema discovery - Explores available databases and tables
  2. Table analysis - Examines the structure of relevant tables
  3. Query generation - Creates SQL based on your question and the discovered schema

We can see that it did find the uk_price_paid table and generated a query for us to run. If we run that query, we'll see the following output:

If we want to ask follow up questions, we need to ask our question from scratch.

Finding expensive properties in Greater London

Since the feature doesn't maintain conversation history, each query must be self-contained. When asking follow-up questions, you need to provide the full context rather than referring to previous queries. For example, after seeing the previous results, we might want to focus specifically on Greater London properties. Rather than asking "What about Greater London?", we need to include the complete context:

Notice that the AI goes through the same discovery process, even though it just examined this data:

This generates a more targeted query that filters specifically for Greater London and breaks down results by year. The output of the query is shown below:

The City of London consistently appears as the most expensive district! You'll notice the AI created a reasonable query, though the results are ordered by average price rather than chronologically. For a year-over-year analysis, we might refine your question to ask specifically for "the most expensive district each year" to get results grouped differently.