ClickHouse's New JSON Type @ FOSDEM 2025

TL;DR

  • Traditional JSON handling in analytical databases forces slow query-time deserialization or creates massive column bloat, type conflicts, and sparse data waste.
  • ClickHouse, the fastest open-source real-time analytics database, solves these bottlenecks with a new native JSON data type powered by underlying Variant and Dynamic capabilities.
  • The solution offers schema-on-read flexibility with schema-on-write performance, preserving strict columnar performance via highly compressed sub-columns and discriminator mapping.
  • Users can precisely tune ingestion and performance using parameters like max_dynamic_paths, max_dynamic_types, static type hints, and SKIP conditions.
  • In a 1-billion document JSONBench test, ClickHouse was 10x faster than Elasticsearch (using half the storage space), 2,500x faster than MongoDB, and 9,000x faster than PostgreSQL.

JSON serves as the lingua franca for semi-structured data, from application logs to event streams. As data volumes grow, it breaks analytical databases. Querying it often forces engineers to trade query speed for schema flexibility.

Storing entire documents as Strings forces databases to parse data slowly and repeatedly, undermining columnar engine speed. This approach treats the database as a blind storage layer, shifting the heavy lifting to query time.

ClickHouse, a relational, columnar, shared-nothing, and real-time database, introduces a native JSON data type engineered to deliver high-performance analytics without sacrificing schema flexibility. By bypassing traditional parsing bottlenecks, engineering teams can execute real-time queries directly on semi-structured data without building complex ETL pipelines or inflating storage costs.

Why does JSON slow down traditional columnar databases?

JSON's unpredictable structure fundamentally breaks the highly compressed, vectorized model of traditional columnar databases.

Analytical challengeTraditional database limitationClickHouse solution
Query bottlenecksEntire JSON objects are stored as Strings, forcing slow deserialization at query time.Native JSON type pushes heavy lifting to ingestion, ensuring rapid query speeds.
Exploding columnsTreating every path as a column exhausts file descriptors and degrades background processes.The Dynamic type groups overflow paths into a shared fallback sub-column.
Conflicting typesStrict formats reject or coerce data when a path changes type (e.g., Integer to String).The Variant type natively stores and tracks mixed data types via a compressed discriminator column.
Sparse data wasteUncommon JSON paths create sparse columns filled mostly with wasted NULL values.Native Variant type stores mixed data types densely without null bloat.

How does the native ClickHouse JSON type work?

The new JSON data type ties together the Variant and Dynamic building blocks to create a highly optimized solution. Under the hood, the JSON type treats each unique JSON key path as a potential sub-column. If you do not explicitly define a path's data type, ClickHouse automatically assigns it the Dynamic type.

To prevent a proliferation of column files on disk, the JSON type allows precise control over storage and query speeds through four parameters:

Parameter / featureDefault valueFeature functionalityPerformance benefit
max_dynamic_paths1024Limits the number of unique JSON key paths stored as individual sub-columns.Prevents metadata bloat and column proliferation on disk.
max_dynamic_types32Limits how many distinct data types are stored in separate files for a specific path.Protects system performance while handling type diversity.
Static type hintsN/AAssigns static types (e.g., Int64) to known, frequently queried paths.Bypasses dynamic overhead to guarantee maximum query speed.
SKIP & SKIP REGEXPN/AInstructs the database to ignore specific paths or regex matches during parsing.Reduces parsing noise and lowers storage costs.

Example

-- Enable the experimental JSON type
SET allow_experimental_json_type = 1;

CREATE TABLE events
(
    -- Store JSON data using the new JSON data type
    event_payload JSON(
        -- 1. Limit total unique paths and types per path
        max_dynamic_paths=1024,
        max_dynamic_types=20,

        -- 2. Define static types for high-performance on known paths
        -- Note: Paths containing dots must be enclosed in backticks
        `user.id` String,
        `event.timestamp` DateTime,

        -- 3. Ignore unnecessary data to save space
        SKIP `user.geo_data`,
        SKIP REGEXP 'tmp_.*'
    )
)
ENGINE = MergeTree
ORDER BY (event_payload.event.timestamp, event_payload.user.id);

Querying with ease: standard dot notation

Query semi-structured data using standard SQL dot notation (e.g., SELECT event_payload.event.timestamp). This replaces verbose, performance-degrading functions like JSONExtractString() or JSONExtractInt().

Seamless ingestion ecosystem

You can pipe data directly from streaming sources like Kafka or object storage like S3 into this native JSON type. This eliminates the need to build and maintain fragile upstream ETL pipelines.

What is under the hood? decoding the Variant and Dynamic types

ClickHouse's native JSON support relies on two general-purpose data types that solve the challenges of storing unpredictable data in a columnar format: Variant and Dynamic.

How the Variant type natively handles conflicting data

Databases typically solve the challenge of storing mixed data types by coercing values into a flexible type like String. That approach sacrifices query performance. ClickHouse's Variant type solves this performance loss by allowing a single column to store values with different, non-compatible types.

Internally, ClickHouse creates a separate, highly compressed sub-column for each concrete data type (e.g., Int64, String, DateTime). A UInt8 discriminator column tracks which data type belongs to which row. When queried, an in-memory offsets column maps each value to its position in the correct dense sub-column.

This architecture natively handles mixed data types while preserving data integrity. Because the subtype columns are stored densely, without placeholders for NULL values, it minimizes the massive storage waste associated with sparse JSON paths.

Explore the Variant type in our official documentation.

How the Dynamic type achieves schema-on-read

The Dynamic data type provides schema-on-read flexibility for semi-structured data. It functions as an enhanced Variant type, but you do not need to declare subtypes in advance. This dynamic typing allows you to ingest data where types for a given path may vary or are unknown upfront, enabling a ClickHouse dynamic schema that adapts at runtime.

Internally, Dynamic columns store data similarly to Variant, with the addition of a dynamic_structure.bin metadata file that tracks the subtypes it has ingested. This allows the schema to evolve automatically as new data loads.

To prevent the "exploding paths" problem, the Dynamic type relies on the max_dynamic_types parameter (detailed in the table above) to ensure the most common types deliver optimal query performance.

Once you exceed this limit, ClickHouse stores any additional types in a single fallback column using a simple type-value representation. This mechanism stores heterogeneous data while protecting system performance.

Learn more about the Dynamic type in our official documentation.

How fast is it? the 1-billion document JSON benchmark test

JSONBench, a fully reproducible, open-source benchmark, validates the performance of the native JSON data type.

Evaluated against a real-world dataset of 1 billion BlueSky events (125 GB compressed), the benchmark compares ClickHouse against other popular databases with strong JSON support: Elasticsearch, DuckDB, MongoDB, and PostgreSQL. All databases were evaluated within the same JSONBench framework.

Database companyAnalytical query speedStorage efficiencyPrimary architecture focus
ClickHouseBaseline (Very fast)Baseline (Highly efficient)Columnar / Real-time Analytics
Elasticsearch8-16x Slower than ClickHouse2x more inefficient than ClickHouseSearch / Inverted Index
DuckDB640-9,000x Slower than ClickHouse5x more inefficient than ClickHouseColumnar / In-process OLAP
MongoDB480-7,000x Slower than ClickHouse1.5x more inefficient than ClickHouseDocument-oriented
PostgreSQL1,400-10,000x Slower than ClickHouse6x more inefficient than ClickHouseRelational / Row-based

This speed stems from its columnar storage foundation, enhanced by the Variant and Dynamic types. This design is inherently better suited for semi-structured analytics than competitors' document-oriented or row-based architectures.

The entire benchmark, including code, methodology, and the 1-billion document dataset, is open-sourced and hosted in a public S3 bucket for independent verification.

Spin up a free trial of ClickHouse Cloud to test high-performance JSON analytics with your own data.

FAQs

What is ClickHouse’s new JSON type?

It is a native JSON data type built for fast analytics on semi-structured data. It gives JSON flexibility without forcing slow query-time parsing.

Why is JSON slow in traditional analytical databases?

Most databases store JSON as strings or turn every path into a column. That causes slow parsing, sparse data, type conflicts, and storage waste.

How does ClickHouse make JSON queries faster?

ClickHouse shifts the heavy work to ingestion. JSON paths are stored as efficient sub-columns, so queries run with columnar speed.

How does the ClickHouse JSON type work?

Each JSON path can become a sub-column. Known paths can use static types, while unknown or changing paths use Dynamic and Variant under the hood.

What are the Variant and Dynamic types in ClickHouse?

Variant stores multiple concrete data types in one column. Dynamic handles unknown or changing types and evolves as new data arrives.

How do you query JSON in ClickHouse?

Use standard SQL dot notation, such as event_payload.user.id. You do not need verbose JSON extraction functions for common queries.

What do max_dynamic_paths and max_dynamic_types do?

They limit how many JSON paths and types ClickHouse stores separately. This controls file growth, metadata overhead, and performance.

How much faster is ClickHouse than Elasticsearch, MongoDB, and PostgreSQL for JSON analytics?

In the 1 billion document JSONBench test, ClickHouse was 10x faster than Elasticsearch, 2,500x faster than MongoDB, and 9,000x faster than PostgreSQL.