ClickHouse is a column-oriented database that enables its users to generate powerful analytics, using SQL queries, in real-time.
Exceeds other column-oriented database management systems
Incredible scaling both horizontally and vertically
Supports async replication and can be deployed across multiple datacenters
Processes analytical queries faster than traditional row-oriented systems
Purely distributed system, including enterprise-grade security
User-friendly SQL query dialect, built-in analytics capabilities, and more
Why choose ClickHouse?
ClickHouse uses all available hardware to its full potential to process each query as fast as possible. Peak processing performance for a single query stands at more than 2 terabytes per second (after decompression, only used columns). In distributed setup reads are automatically balanced among healthy replicas to avoid increasing latency.
ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. All nodes are equal, which allows avoiding having single points of failure. Downtime of a single node or the whole datacenter won't affect the system's availability for both reads and writes.
Easy to use
ClickHouse is simple and works out-of-the-box. It streamlines all your data processing: ingest all your structured data into the system and it becomes instantly available for building reports. SQL dialect allows expressing the desired result without involving any custom non-standard API that could be found in some DBMS.
ClickHouse can be configured as a purely distributed system located on independent nodes, without any single points of failure. It also includes a lot of enterprise-grade security features and fail-safe mechanisms against human errors.
ClickHouse processes typical analytical queries two to three orders of magnitude faster than traditional row-oriented systems with the same available I/O throughput and CPU capacity. Columnar storage format allows fitting more hot data in RAM, which leads to shorter typical response times.
Total cost of ownership could be further lowered by using commodity hardware with rotating disk drives instead of enterprise grade NVMe or SSD without significant sacrifices in latency for most kinds of queries.
Strives for CPU efficiency
Vectorized query execution involves relevant SIMD processor instructions and runtime code generation. Processing data in columns increases CPU cache line hit rate.
Optimizes disk drive access
ClickHouse minimizes the number of seeks for range queries, which increases the efficiency of using rotating disk drives, as it maintains locality of reference for continually stored data.
Minimizes data transfers
ClickHouse enables companies to manage their data and create reports without using specialized networks that are aimed at high-performance computing.
Dramatically reduce preparation times by letting ClickHouse automatically understand your data. The database schema gets generated by analyzing the source files and sets the database up for efficient querying by itself.
Efficient managing of denormalized data
Column-oriented nature of ClickHouse allows having hundreds or thousands of columns per table without slowing down SELECT queries. It's possible to pack even more data in by leveraging wide range data organizing options, such as arrays, tuples and nested data structures.
Join distributed or co-located data
ClickHouse provides various options for joining tables. Joins could be either local or distributed, they can also access data stored in external systems. There's also an external dictionaries support that provides an alternative more simple syntax for accessing data from an outside source.
Approximate query processing
Users can control the trade-off between result accuracy and query execution time, which is handy when dealing with multiple terabytes or petabytes of data. ClickHouse also provides probabilistic data structures for fast and memory-efficient calculation of cardinalities and quantiles.
ClickHouse scales well both vertically and horizontally. ClickHouse is easily adaptable to perform either on a cluster with hundreds or thousands of nodes or on a single server or even on a tiny virtual machine. Currently, there are installations with more multiple trillion rows or hundreds of terabytes of data per single node.
There are many ClickHouse clusters consisting of multiple hundred nodes while the largest known ClickHouse cluster is well over a thousand nodes.
ClickHouse at scale
For analytics over a stream of clean, well structured and immutable events or logs. It is recommended to put each such stream into a single wide fact table with pre-joined dimensions.
Web and App analytics
E-commerce and finance
Advertising networks and RTB
Monitoring and telemetry
Internet of Things
Get started for free
However you choose to use ClickHouse, it's easy to get started.