Skip to main content

Handling JSON

JSON has established itself as one of the most popular language-independent data interchange formats. As a “semi-structured” data format, it balances user readability with greater space efficiency than alternatives such as XML. Although typically used as the data format for requests and responses in web APIs, it is increasingly used for logging and general-purpose dataset distribution.

ClickHouse provides several approaches for handling JSON, each with its respective pros and cons and usage. More recent versions of ClickHouse have introduced new types which allow even greater flexibility and performance for JSON storage and querying. While these developments make older techniques less applicable, they can still be useful and are documented here for comprehension and those users on older versions.

For example purposes, we utilize two datasets: a 1m row subset of the Github dataset and an example NGINX log in JSON format. The former includes nested columns, useful for example purposes. It is also deliberately sparse, which helps illustrate some challenges of JSON. The latter allows us to discuss standard techniques for JSON logs.