ClickHouse's New JSON Type @ FOSDEM 2025
TL;DR
- Traditional JSON handling in analytical databases forces slow query-time deserialization or creates massive column bloat, type conflicts, and sparse data waste.
- ClickHouse, the fastest open-source real-time analytics database, solves these bottlenecks with a new native
JSONdata type powered by underlyingVariantandDynamiccapabilities. - The solution offers schema-on-read flexibility with schema-on-write performance, preserving strict columnar performance via highly compressed sub-columns and discriminator mapping.
- Users can precisely tune ingestion and performance using parameters like
max_dynamic_paths,max_dynamic_types, static type hints, andSKIPconditions. - In a 1-billion document JSONBench test, ClickHouse was 10x faster than Elasticsearch (using half the storage space), 2,500x faster than MongoDB, and 9,000x faster than PostgreSQL.
JSON serves as the lingua franca for semi-structured data, from application logs to event streams. As data volumes grow, it breaks analytical databases. Querying it often forces engineers to trade query speed for schema flexibility.
Storing entire documents as Strings forces databases to parse data slowly and repeatedly, undermining columnar engine speed. This approach treats the database as a blind storage layer, shifting the heavy lifting to query time.
ClickHouse, a relational, columnar, shared-nothing, and real-time database, introduces a native JSON data type engineered to deliver high-performance analytics without sacrificing schema flexibility. By bypassing traditional parsing bottlenecks, engineering teams can execute real-time queries directly on semi-structured data without building complex ETL pipelines or inflating storage costs.
Why does JSON slow down traditional columnar databases?
JSON's unpredictable structure fundamentally breaks the highly compressed, vectorized model of traditional columnar databases.
| Analytical challenge | Traditional database limitation | ClickHouse solution |
|---|---|---|
| Query bottlenecks | Entire JSON objects are stored as Strings, forcing slow deserialization at query time. | Native JSON type pushes heavy lifting to ingestion, ensuring rapid query speeds. |
| Exploding columns | Treating every path as a column exhausts file descriptors and degrades background processes. | The Dynamic type groups overflow paths into a shared fallback sub-column. |
| Conflicting types | Strict formats reject or coerce data when a path changes type (e.g., Integer to String). | The Variant type natively stores and tracks mixed data types via a compressed discriminator column. |
| Sparse data waste | Uncommon JSON paths create sparse columns filled mostly with wasted NULL values. | Native Variant type stores mixed data types densely without null bloat. |
How does the native ClickHouse JSON type work?
The new JSON data type ties together the Variant and Dynamic building blocks to create a highly optimized solution. Under the hood, the JSON type treats each unique JSON key path as a potential sub-column. If you do not explicitly define a path's data type, ClickHouse automatically assigns it the Dynamic type.
To prevent a proliferation of column files on disk, the JSON type allows precise control over storage and query speeds through four parameters:
| Parameter / feature | Default value | Feature functionality | Performance benefit |
|---|---|---|---|
max_dynamic_paths | 1024 | Limits the number of unique JSON key paths stored as individual sub-columns. | Prevents metadata bloat and column proliferation on disk. |
max_dynamic_types | 32 | Limits how many distinct data types are stored in separate files for a specific path. | Protects system performance while handling type diversity. |
| Static type hints | N/A | Assigns static types (e.g., Int64) to known, frequently queried paths. | Bypasses dynamic overhead to guarantee maximum query speed. |
SKIP & SKIP REGEXP | N/A | Instructs the database to ignore specific paths or regex matches during parsing. | Reduces parsing noise and lowers storage costs. |
Example
-- Enable the experimental JSON type
SET allow_experimental_json_type = 1;
CREATE TABLE events
(
-- Store JSON data using the new JSON data type
event_payload JSON(
-- 1. Limit total unique paths and types per path
max_dynamic_paths=1024,
max_dynamic_types=20,
-- 2. Define static types for high-performance on known paths
-- Note: Paths containing dots must be enclosed in backticks
`user.id` String,
`event.timestamp` DateTime,
-- 3. Ignore unnecessary data to save space
SKIP `user.geo_data`,
SKIP REGEXP 'tmp_.*'
)
)
ENGINE = MergeTree
ORDER BY (event_payload.event.timestamp, event_payload.user.id);
Querying with ease: standard dot notation
Query semi-structured data using standard SQL dot notation (e.g., SELECT event_payload.event.timestamp). This replaces verbose, performance-degrading functions like JSONExtractString() or JSONExtractInt().
Seamless ingestion ecosystem
You can pipe data directly from streaming sources like Kafka or object storage like S3 into this native JSON type. This eliminates the need to build and maintain fragile upstream ETL pipelines.
What is under the hood? decoding the Variant and Dynamic types
ClickHouse's native JSON support relies on two general-purpose data types that solve the challenges of storing unpredictable data in a columnar format: Variant and Dynamic.
How the Variant type natively handles conflicting data
Databases typically solve the challenge of storing mixed data types by coercing values into a flexible type like String. That approach sacrifices query performance. ClickHouse's Variant type solves this performance loss by allowing a single column to store values with different, non-compatible types.
Internally, ClickHouse creates a separate, highly compressed sub-column for each concrete data type (e.g., Int64, String, DateTime). A UInt8 discriminator column tracks which data type belongs to which row. When queried, an in-memory offsets column maps each value to its position in the correct dense sub-column.
This architecture natively handles mixed data types while preserving data integrity. Because the subtype columns are stored densely, without placeholders for NULL values, it minimizes the massive storage waste associated with sparse JSON paths.
Explore the Variant type in our official documentation.
How the Dynamic type achieves schema-on-read
The Dynamic data type provides schema-on-read flexibility for semi-structured data. It functions as an enhanced Variant type, but you do not need to declare subtypes in advance. This dynamic typing allows you to ingest data where types for a given path may vary or are unknown upfront, enabling a ClickHouse dynamic schema that adapts at runtime.
Internally, Dynamic columns store data similarly to Variant, with the addition of a dynamic_structure.bin metadata file that tracks the subtypes it has ingested. This allows the schema to evolve automatically as new data loads.
To prevent the "exploding paths" problem, the Dynamic type relies on the max_dynamic_types parameter (detailed in the table above) to ensure the most common types deliver optimal query performance.
Once you exceed this limit, ClickHouse stores any additional types in a single fallback column using a simple type-value representation. This mechanism stores heterogeneous data while protecting system performance.
Learn more about the Dynamic type in our official documentation.
How fast is it? the 1-billion document JSON benchmark test
JSONBench, a fully reproducible, open-source benchmark, validates the performance of the native JSON data type.
Evaluated against a real-world dataset of 1 billion BlueSky events (125 GB compressed), the benchmark compares ClickHouse against other popular databases with strong JSON support: Elasticsearch, DuckDB, MongoDB, and PostgreSQL. All databases were evaluated within the same JSONBench framework.
| Database company | Analytical query speed | Storage efficiency | Primary architecture focus |
|---|---|---|---|
| ClickHouse | Baseline (Very fast) | Baseline (Highly efficient) | Columnar / Real-time Analytics |
| Elasticsearch | 8-16x Slower than ClickHouse | 2x more inefficient than ClickHouse | Search / Inverted Index |
| DuckDB | 640-9,000x Slower than ClickHouse | 5x more inefficient than ClickHouse | Columnar / In-process OLAP |
| MongoDB | 480-7,000x Slower than ClickHouse | 1.5x more inefficient than ClickHouse | Document-oriented |
| PostgreSQL | 1,400-10,000x Slower than ClickHouse | 6x more inefficient than ClickHouse | Relational / Row-based |
This speed stems from its columnar storage foundation, enhanced by the Variant and Dynamic types. This design is inherently better suited for semi-structured analytics than competitors' document-oriented or row-based architectures.
The entire benchmark, including code, methodology, and the 1-billion document dataset, is open-sourced and hosted in a public S3 bucket for independent verification.
- Read the detailed analysis: Read our JSON benchmark comparison.
- Reproduce the results: Visit the GitHub repository.
Spin up a free trial of ClickHouse Cloud to test high-performance JSON analytics with your own data.
FAQs
What is ClickHouse’s new JSON type?
It is a native JSON data type built for fast analytics on semi-structured data. It gives JSON flexibility without forcing slow query-time parsing.
Why is JSON slow in traditional analytical databases?
Most databases store JSON as strings or turn every path into a column. That causes slow parsing, sparse data, type conflicts, and storage waste.
How does ClickHouse make JSON queries faster?
ClickHouse shifts the heavy work to ingestion. JSON paths are stored as efficient sub-columns, so queries run with columnar speed.
How does the ClickHouse JSON type work?
Each JSON path can become a sub-column. Known paths can use static types, while unknown or changing paths use Dynamic and Variant under the hood.
What are the Variant and Dynamic types in ClickHouse?
Variant stores multiple concrete data types in one column. Dynamic handles unknown or changing types and evolves as new data arrives.
How do you query JSON in ClickHouse?
Use standard SQL dot notation, such as event_payload.user.id. You do not need verbose JSON extraction functions for common queries.
What do max_dynamic_paths and max_dynamic_types do?
They limit how many JSON paths and types ClickHouse stores separately. This controls file growth, metadata overhead, and performance.
How much faster is ClickHouse than Elasticsearch, MongoDB, and PostgreSQL for JSON analytics?
In the 1 billion document JSONBench test, ClickHouse was 10x faster than Elasticsearch, 2,500x faster than MongoDB, and 9,000x faster than PostgreSQL.

