The 9 best columnar databases in 2026 fall into four categories: open-source real-time OLAP (ClickHouse, Druid, Pinot), managed cloud warehouses and lakehouses (Snowflake, BigQuery, Redshift, Databricks), embedded analytics (DuckDB), and time-series specialists (QuestDB). Every claim about latency and compression below is grounded in ClickBench, first on identical c6a.4xlarge hardware for the five open-source engines, then on cloud-native compute (warehouse sizes, slots, RPUs) for ClickHouse Cloud against Snowflake, BigQuery, Redshift, and Databricks, since those four cannot be pinned to a fixed instance.
How we ranked these databases #
We require a ClickBench result for inclusion. ClickBench runs 43 analytical queries on the same 100 GB Hits clickstream dataset across 60+ engines. For the five open-source engines (ClickHouse, DuckDB, Druid, Pinot, QuestDB) we pin to c6a.4xlarge so two engines are directly comparable on hardware. The four managed cloud services (Snowflake, BigQuery, Redshift, Databricks) expose compute only as warehouse sizes, slots, or RPUs, so we compare them against ClickHouse Cloud on the cloud-native ClickBench view, with each warehouse at its largest available configuration.
The databases at a glance #
A quick overview of the nine engines covered below: deployment shape, license, latency tier, and primary use case. The numbered sections that follow expand each one.
| # | Database | Deployment | License | Latency tier | Primary use case |
|---|---|---|---|---|---|
| 1 | ClickHouse | Self-hosted + managed | Apache 2.0 | Sub-second | Real-time OLAP, customer-facing analytics, data warehousing |
| 2 | DuckDB | Embedded library | MIT | Sub-second | Single-node local file exploration |
| 3 | Snowflake | Managed only | Commercial | Seconds to minutes | Data warehousing |
| 4 | BigQuery | Managed only | Commercial | Seconds to minutes | Data warehousing |
| 5 | Amazon Redshift | Managed only | Commercial | Seconds to minutes | Data warehousing |
| 6 | Databricks SQL | Managed only | Commercial | Seconds | Lakehouse, data warehousing |
| 7 | Apache Druid | Self-hosted + managed | Apache 2.0 | Seconds | Customer-facing analytics |
| 8 | Apache Pinot | Self-hosted + managed | Apache 2.0 | Sub-second | Customer-facing analytics |
| 9 | QuestDB | Self-hosted + managed | Apache 2.0 | Sub-second time-series | Time-series |
Performance and compression #
Open-source OLAP engines #
ClickBench runs the five open-source engines below on identical c6a.4xlarge AWS hardware against the same 100 GB Hits clickstream dataset (~99.9 million rows). Because the hardware is fixed, load times, query latencies, and on-disk compression are directly comparable. The cloud warehouses are covered in the next section because they cannot be pinned to the same instance.
| Database | Hardware | Load time | Loaded size on disk | Size vs. ClickHouse |
|---|---|---|---|---|
| ClickHouse | c6a.4xlarge | 269s | 14.26 GiB | 1.00× |
| DuckDB | c6a.4xlarge | 119s | 19.14 GiB | 1.34× |
| Apache Druid | c6a.4xlarge | 19,620s | 42.09 GiB | 2.95× |
| Apache Pinot | c6a.4xlarge | 2,032s | Unable to reliably load data | x |
| QuestDB | c6a.4xlarge | 5,417s | 67.84 GiB | 4.76× |
ClickHouse posts the smallest loaded size by a wide margin. At 14.26 GiB on disk it is roughly 5× smaller than QuestDB's 67.84 GiB and 3× smaller than Druid's 42.09 GiB on the same dataset. DuckDB lands within 35% of ClickHouse's footprint without any tuning, which is impressive for an embedded engine. Compression matters because it sets the floor on storage cost and scan time.
On load time, ClickHouse takes 269 seconds for full MergeTree construction. QuestDB and Pinot run into the thousands of seconds (Pinot was unable to reliably load the full 100 GB, throwing errors and dropping data; its reported time represents what it was able to load). Druid's 19,620-second load (5.5 hours) is the heaviest in the group, reflecting its segment-building and indexing-heavy ingest model. DuckDB is in-process and only loads data into memory, managing a speedy 119 seconds.
On query latency, ClickHouse is fastest on the majority of the 43 queries, with sub-millisecond times on point-style filters (Q0, Q6, Q7, Q29, Q36, Q37, Q40–Q42) and single-digit-second times on the heaviest aggregations and joins (Q20, Q21, Q22, Q33, Q34). QuestDB wins on time-bucketed queries (Q11, Q12, Q13, Q21, Q24, Q25, Q26, Q27, Q31, Q34) where its time-partitioned column files give it an edge. DuckDB is consistently competitive on bulk aggregations and is the fastest engine in the group on a handful of queries (Q5, Q23, Q28). Pinot is fastest on a small number of point lookups (Q3, Q17, Q19, Q33, Q38) where its inverted indexes pay off, but cannot complete several queries on this dataset (Q6, Q18, Q21, Q22, Q26, Q28, Q42). Druid is the slowest of the group on most queries and also fails to run several (Q6, Q18, Q21, Q22, Q26, Q28, Q32, Q34, Q42).
The full per-query breakdown lives at benchmark.clickhouse.com, and submissions are open via pull request to github.com/ClickHouse/ClickBench.
Cloud data warehouses #
Snowflake, BigQuery, Redshift, and Databricks cannot be pinned to c6a.4xlarge. Their compute is exposed only as warehouse sizes, slot reservations, or RPUs. ClickBench therefore runs a second comparison on cloud-native compute, where each service is tested at the largest configuration the benchmark accepts. BigQuery and Redshift use serverless, which is their best-performing tier.
| Service | Configuration | Load time | Loaded size | Total query time across 43 queries |
|---|---|---|---|---|
| ClickHouse Cloud (GCP) | 2× ClickHouse, 356 GiB | 21s | 9.54 GiB | ~13.5s |
| Snowflake | 4XL warehouse | 2,524s | 11.46 GiB | ~19.8s |
| Databricks | 4X-Large warehouse | 30s | 9.52 GiB | ~36.6s |
| BigQuery | Serverless | 777s | 8.16 GiB | ~28.1s |
| Amazon Redshift | Serverless | 1,889s | 28.22 GiB | ~68.1s |
Three things stand out. First, ClickHouse Cloud is the fastest service across the 43 queries despite running on a noticeably smaller compute footprint than the 4XL/4X-Large warehouses it is compared against: Snowflake is roughly 1.5× slower, BigQuery 2×, Databricks 2.7×, Redshift 5×. Second, ClickHouse Cloud loads the dataset in 21 seconds versus 2,524 seconds for Snowflake and 1,889 seconds for Redshift Serverless. The warehouses pay heavily for getting data into a queryable state. Third, on-disk size is in the same ballpark for ClickHouse, Snowflake, Databricks, and BigQuery (8–12 GiB), with Redshift the outlier at 28.22 GiB.
The full per-query table, hardware notes, and submission process are at benchmark.clickhouse.com.
1. ClickHouse #
ClickHouse is an open-source columnar database under Apache 2.0, built for sub-second analytical queries on multi-petabyte data. It powers customer-facing analytics at Uber, Cloudflare, eBay, and ByteDance. Choose it for real-time OLAP at scale; avoid it for single-row OLTP or low-latency point lookups.
ClickHouse uses columnar storage with MergeTree-family table engines and sparse primary indexes, which let it skip blocks before reading them. Ingest is optimised for high-throughput writes: a single node sustains millions of rows per second, with new data queryable within seconds of arrival. On ClickBench (c6a.4xlarge, 100 GB Hits) ClickHouse posts the smallest loaded size of any engine in the run at 14.26 GiB, and is the fastest engine on the majority of the 43 queries: sub-millisecond on point-style filters and single-digit-second on the heaviest aggregations. ClickHouse has the most flexible deployment story of any engine in this list. It's a single open-source binary you can run on a laptop, scale to a fleet of servers, or, uniquely among engines on this list, use without a server at all: clickhouse-local is a CLI for querying files directly, and chDB embeds the engine in-process with a pandas-compatible API for notebooks and Python pipelines. For production deployments, ClickHouse Cloud runs managed on AWS, GCP, and Azure with a BYOC option, and the official Kubernetes operator handles self-managed orchestration.
ClickHouse also has the strongest Postgres-integration story in this list, which matters if you already run Postgres for OLTP. The pg_clickhouse Postgres extension lets a Postgres client query ClickHouse tables directly, with predicates and projections pushed down so the heavy work happens on the columnar side. ClickHouse Cloud goes further with PeerDB and ClickPipes for managed Postgres CDC: a few clicks set up a logical-replication stream that lands Postgres change events into ClickHouse with sub-second lag. The combined story (best-of-breed OLTP on Postgres, best-of-breed OLAP on ClickHouse, with managed CDC between them) is one no other engine on this list offers, and it is why the choice is less about a query engine in isolation and more about a platform to standardise on.
2. DuckDB #
DuckDB is an MIT-licensed embedded OLAP engine that runs in-process, with no server, daemon, or cluster to deploy. It excels at ad-hoc analysis on a single machine: querying Parquet, CSV, and Iceberg files directly from a laptop, exploring datasets inside a Python notebook, or running quick analytical work from a CLI. It isn't designed to be a shared data warehouse, host multi-user concurrent workloads, or scale beyond a single node. Those use cases sit with the engines elsewhere on this list.
DuckDB reads Parquet, CSV, and Iceberg directly without an import step, pushing predicates and projections into the file. The query engine is vectorised on 1024-row batches. On ClickBench DuckDB ingests the 100 GB dataset in 119s (the fastest load in the comparison because there is no server, no segments, and no replication) and stores the dataset in 19.14 GiB on disk, around 34% larger than ClickHouse's 14.26 GiB. Compression is solid for an embedded engine, but not best-in-class. It is consistently competitive on bulk aggregations and the fastest engine in the group on a handful of queries (Q5, Q23, Q28). There is no networked deployment story by design. The scale ceiling is a single machine's RAM and disk.
3. Snowflake #
Snowflake is a managed columnar data warehouse using its proprietary FDN format, stored as immutable micro-partitions of 50–500 MB uncompressed on cloud object storage. It separates compute from storage so multiple virtual warehouses read the same data concurrently. Choose it for traditional BI and infrequent reporting where query flexibility matters more than performance: drag-and-drop chart builders generating unoptimised queries, analysts running large ad-hoc scans and joins across many tables, and reports where waiting minutes or hours is acceptable as long as the query completes. Avoid it for cost- or latency-sensitive workloads: customer-facing dashboards, predictable repeated queries, high-concurrency apps, automated processes, and agentic analytics where AI agents emit queries at high frequency.
Snowflake supports ANSI SQL, the VARIANT type for semi-structured data, and Iceberg external tables. Billing is in credits scaled by warehouse size, with per-second billing and a one-minute minimum each time a warehouse resumes. On AWS US East, the on-demand price is $2 per credit on Standard, $3 on Enterprise, and $4 on Business Critical. Storage is billed separately. A few quirks shape the real bill: Gen 2 warehouses consume 25–35% more credits than Gen 1 at the same size, multi-cluster autoscale multiplies the hourly burn by the number of active clusters, and Query Acceleration Service adds serverless compute on top of warehouse credits. See ClickHouse's engineer-friendly guide to how the major cloud data warehouses bill you for the full model. On the cloud-native ClickBench view, Snowflake at its 4XL warehouse runs the 43-query suite in ~19.8s against ClickHouse Cloud's ~13.5s on a 2× ClickHouse / 356 GiB config, and takes 2,524 seconds to load the dataset versus 21 seconds for ClickHouse. Cost-performance follows the same trajectory: at 100 billion rows, Snowflake 4X-Large lands at roughly 32× worse cost-per-query than ClickHouse Cloud, per the cost-performance comparison across the five major cloud data warehouses. The shape is consistent with a design target of governed concurrency rather than single-query latency or fast time-to-query.
4. Google BigQuery #
BigQuery is a serverless columnar data warehouse using the Capacitor format, Google's successor to ColumnIO. There are no clusters or instances to manage. Queries run on shared compute and bill by bytes scanned (on-demand) or reserved slots. Choose BigQuery if you're a GCP-native organisation and the surrounding Google Cloud surface area is itself part of the value: Vertex AI for in-warehouse ML through BigQuery ML and the feature store, Looker and Looker Studio for BI on top of native connectors, and Dataform for SQL transformations managed inside BigQuery itself. The fit is traditional BI, ad-hoc analyst SQL, and reports where seconds-to-minutes latency is acceptable. Avoid it for cost- or latency-sensitive workloads: customer-facing dashboards, predictable repeated queries, high-concurrency apps, automated processes, and agentic analytics where AI agents emit queries at high frequency.
Capacitor applies dictionary, RLE, and DELTA encodings before LZ4 or ZSTD, and the Storage Write API supports streaming ingest at up to 10 GB/s per project. BigQuery has two billing models: on-demand at $6.25 per TiB scanned in US regions (first 1 TiB/month free), and capacity-based reserved slots at $0.04 per slot-hour on Standard, $0.06 on Enterprise, and $0.10 on Enterprise Plus. The mental model is different from a warehouse-credit engine. Capacity mode bills active slot-time across query stages rather than wall-clock runtime, so a 20-second query that fans out to thousands of parallel slots can rack up tens of thousands of slot-seconds. On-demand has no time meter at all and bills purely by bytes scanned. ClickHouse's engineer-friendly guide to how the major cloud data warehouses bill you walks through the full model. On the cloud-native ClickBench view, BigQuery Serverless posts the smallest on-disk footprint of any service in the comparison (8.16 GiB) thanks to Capacitor's encodings, but runs the 43-query suite in ~28.1s (roughly 2× ClickHouse Cloud) because per-query latency is dominated by slot scheduling rather than scan time. Cost-performance follows the same pattern: on the cost-performance comparison across the five major cloud data warehouses, BigQuery Enterprise lands ~14× worse cost-per-query than ClickHouse Cloud at 10 billion rows, and BigQuery On-Demand falls off the chart at scale because per-byte pricing scales linearly with dataset size.
5. Amazon Redshift #
Amazon Redshift is AWS's managed columnar data warehouse. RA3 nodes decouple compute from S3-backed managed storage (RMS). Each block carries Zone Maps holding min/max metadata so the planner skips unnecessary blocks. Choose Redshift if you're an AWS-native organisation and the surrounding AWS surface area is itself part of the value: Redshift Spectrum for federated queries against S3 through the Glue Data Catalog, Redshift ML for in-warehouse model training and inference via SageMaker, and QuickSight for BI through a native connector. The fit is traditional BI, ad-hoc analyst SQL, and reports where seconds-to-minutes latency is acceptable. Avoid it for cost- or latency-sensitive workloads: customer-facing dashboards, predictable repeated queries, high-concurrency apps, automated processes, and agentic analytics where AI agents emit queries at high frequency.
Redshift Serverless bills in RPUs (Redshift Processing Units), where each RPU is 16 GB of memory and 2 vCPU, priced at $0.36 per RPU-hour in US East with per-second metering and a 60-second minimum. You configure a base capacity (default 128 RPUs, configurable 4–512) and a maximum ceiling, and ML-driven predictive scaling (RAIS) pre-allocates compute before heavy queries rather than fanning out per query stage like BigQuery. The provisioned RA3 cluster product is still available for steady workloads, starting at $1.086/hr (ra3.xlplus, US East) on demand. ClickHouse's engineer-friendly guide to how the major cloud data warehouses bill you walks through the model in detail. On the cloud-native ClickBench view, Redshift Serverless is the slowest of the five services tested at ~68.1s for the 43-query suite (5× ClickHouse Cloud), the largest loaded size at 28.22 GiB (3× ClickHouse Cloud), and a 1,889-second load time. Cost-performance follows the same shape: at 100 billion rows, Redshift Serverless lands ~3.1× worse cost-per-query than ClickHouse Cloud on the cost-performance comparison across the five major cloud data warehouses. The profile is typical of warehouse engines tuned for steady-state throughput rather than per-query latency.
6. Databricks SQL #
Databricks SQL is the analytical query layer of the broader Databricks Lakehouse Platform, with the Photon vectorised execution engine running on top of Delta Lake (Parquet plus a transactional log). The column store is therefore Delta, not a proprietary on-disk format, so the same data is reachable from Spark, MLflow, and the rest of the Databricks surface area. Choose Databricks if you want a single platform that spans data engineering, warehousing, ML, and BI under one umbrella. The functional coverage is the broadest of any vendor on this list, and Databricks SQL is the analytical entry point into it. The trade-off is cost and operational complexity: you're adopting a multi-service platform rather than a focused query engine, and Databricks SQL has no good answer for sub-second latency. Avoid it for cost- or latency-sensitive workloads: customer-facing dashboards, predictable repeated queries, high-concurrency apps, automated processes, and agentic analytics where AI agents emit queries at high frequency.
Databricks SQL warehouses come in T-shirt sizes from 2X-Small up to 4X-Large and bill in DBUs (Databricks Units) at $0.70 per DBU-hour on both Premium and Enterprise SQL tiers, with per-second metering while the warehouse is active. Each size has a fixed DBU burn rate that roughly doubles with each step up: 2X-Small at 4 DBUs/hour, Small at 12, Medium at 24, and so on up the ladder. Photon is the default execution engine for SQL warehouses and gives Databricks its closest profile to a real-time OLAP engine. The quirk to watch is multi-cluster autoscaling: each additional cluster spun up to absorb concurrency bills its own DBUs/hour for the time it is active, so the hourly burn multiplies linearly with active cluster count. ClickHouse's engineer-friendly guide to how the major cloud data warehouses bill you walks through the model in detail. On the cloud-native ClickBench view, Databricks at a 4X-Large warehouse runs the 43-query suite in ~36.6s (about 2.7× slower than ClickHouse Cloud on a 2× ClickHouse / 356 GiB config), with a 30-second load time and 9.52 GiB on disk, the smallest of the cloud services after BigQuery. Cost-performance follows the same shape: at 100 billion rows, Databricks 4X-Large lands at roughly 23× worse cost-per-query than ClickHouse Cloud on the cost-performance comparison across the five major cloud data warehouses.
7. Apache Druid #
Apache Druid is an Apache 2.0 real-time analytical database optimised for high-cardinality time-series and event data. It ingests streaming data into bitmap-indexed segments and serves sub-second queries against them. Druid is one of the oldest engines in this category, among the very earliest open-source real-time OLAP databases. Many of the companies first moving into fast analytical workloads originally adopted it. The project has since lost momentum, and is increasingly being replaced in production by the other engines on this list. Choose it for high-cardinality streaming dashboards (ad-tech, network telemetry); avoid it for ad-hoc SQL on raw rows or batch warehouse queries.
Druid was created at Metamarkets, and a number of teams that originally adopted it have since migrated off. eBay rebuilt its OLAP platform on ClickHouse after benchmarking against Druid:
We explored ClickHouse late last year and, based on documentation and extensive benchmarking tests, it seemed to fit our events use-case well and yielded impressive numbers. We found ClickHouse capable of handling high-ingestion volume without issues. We also did a cost comparison of infrastructure footprint and storage, which showed that we could cut back on our existing infrastructure used for Druid by over 90 percent.
Lyft made the same move, publicly deprecating Druid in favour of ClickHouse. The architecture explains why these migrations keep happening: Druid has six service types (broker, coordinator, historical, middle-manager, overlord, router), making it operationally heavier than ClickHouse or DuckDB. On ClickBench, Druid's load time is the heaviest in the comparison at 19,620s (5.5 hours), and its loaded size is 42.09 GiB (about 3× ClickHouse), reflecting the segment-building and indexing-heavy ingest model. Druid is also unable to complete several queries in the suite (Q6, Q18, Q21, Q22, Q26, Q28, Q32, Q34, Q42).
8. Apache Pinot #
Apache Pinot is an Apache 2.0 columnar database built for one extreme workload: very small point lookups at very high concurrency on fully denormalised tables at very large scale. LinkedIn open-sourced it after building it to power features like "who viewed your profile": billions of profile-view events, millions of concurrent active users, and each query touching a tiny slice of data keyed by user. That shape (lots of activity in the table, very little data returned per query, lots of concurrent users) is what Pinot is designed for. The architecture is built around it: pre-computed star-tree indexes, inverted indexes, and a multi-component cluster, all trading away common OLAP capabilities to hit single-digit-millisecond p99 latency on those query shapes. Choose Pinot when your workload genuinely matches that pattern. Avoid it for general-purpose analytics, BI, ad-hoc analyst SQL, generic customer-facing dashboards, or any case where the scale doesn't justify the operational complexity.
ClickBench bears this out directly. Pinot is the fastest engine in the comparison on a small number of point-style queries (Q3, Q17, Q19, Q33, Q38) where its inverted indexes pay off, which is exactly the workload shape it was designed for. Outside that pattern, it cannot complete several queries on the dataset at all (Q6, Q18, Q21, Q22, Q26, Q28, Q42). The other indexing features (star-tree indexes for pre-aggregation, geospatial indexes, JSON indexes, and Lucene-based text search) round out the same specialisation toward indexed point-lookup workloads rather than general-purpose OLAP.
9. QuestDB #
QuestDB is an Apache 2.0 columnar database built for one thing: pure time-series workloads where time is the dominant access pattern. It writes append-only column files partitioned by timestamp and applies SIMD operators to aggregations. Choose QuestDB when your problem is fundamentally time-series (market data, IoT telemetry, monitoring) and you want a tool specialised for that shape. Avoid it as soon as time-series stops being the whole problem: workloads that mix time-series with general analytics, joins across non-temporal dimensions, or any case where time is one access pattern among several are better served by a general-purpose OLAP engine like ClickHouse.
QuestDB supports ANSI SQL plus its own SAMPLE BY and LATEST ON time-series extensions. Project-published benchmarks show ingest of around 1.4 million rows per second per CPU core. On ClickBench, QuestDB's loaded size is the largest of the OSS engines at 67.84 GiB (4.76× ClickHouse) because the time-partitioned column files are not optimised for the Hits dataset shape, but the engine still wins on every time-bucketed query (Q11, Q12, Q13, Q21, Q24, Q25, Q26, Q27, Q31, Q34), exactly the workload class it is designed for. The Apache 2.0 core is free. QuestDB Enterprise (managed and on-prem) bills per node-hour or per-core respectively.
Choosing between them #
Start with the shape of the problem: is it low-latency user-facing, agentic analytics, BI, data warehousing, time-series, or local exploration? Then ask whether the engine satisfies the query shape and latency needs of that problem. Only then filter on operational complexity and deployment shape.
For real-time OLAP, customer-facing analytics, agentic analytics, BI, and data warehousing on multi-TB data, ClickHouse covers the broadest envelope at sub-second latency across the widest range of deployment shapes (single binary, server, cluster, embedded in-process via chDB, CLI via clickhouse-local, managed cloud, BYOC, and a first-party Kubernetes operator). For pure time-series problems where time is the dominant access pattern, QuestDB is purpose-built and a reasonable choice when the workload is only time-series. For point lookups at extreme concurrency on fully denormalised tables, Pinot's design fits the shape directly. For local or embedded analysis on a single machine, it is a toss-up between DuckDB and ClickHouse, where both work and the pick comes down to familiarity. For an all-in-one platform spanning data engineering, ML, and BI where low latency is not required, Databricks SQL fits that envelope. BigQuery and Redshift are fine choices if you prefer a tightly CSP-native stack on GCP or AWS respectively and the trade-offs in load time, latency, and cost-per-query suit your workload. For an all-in-one OLTP + OLAP stack, ClickHouse is the only entry on this list that ships it out of the box, with managed Postgres alongside the columnar engine, the pg_clickhouse extension for federated querying, and PeerDB and ClickPipes for managed CDC between them, whether you're already running Postgres or starting from scratch.
| Problem shape | Pick | Why |
|---|---|---|
| Real-time OLAP, customer-facing analytics, agentic analytics | ClickHouse | Sub-second on streaming ingest, widest deployment options |
| BI and data warehousing on multi-TB | ClickHouse | Sub-second latency, widest deployment shapes |
| Pure time-series, time-dominant access | QuestDB | Purpose-built for time-series-first workloads |
| Point lookups at extreme concurrency on denormalised tables | Pinot | Designed for that exact pattern |
| Local or embedded on a single machine | DuckDB or ClickHouse | Toss-up; pick on familiarity |
| All-in-one platform (engineering + ML + BI), latency tolerant | Databricks SQL | Single platform across the lakehouse stack |
| GCP-native preference | BigQuery | Native to the Google Cloud surface |
| AWS-native preference | Redshift | Native to the AWS surface |
| All-in-one OLTP + OLAP stack | ClickHouse | Managed Postgres alongside the columnar engine, plus first-party CDC via PeerDB and ClickPipes |
For the threshold question of when a columnar engine is the right choice at all (versus a row store like Postgres), see when to use a columnar database. For measured numbers behind the "sub-second on multi-TB" claims, see the row-vs-column comparison.
Wide-column stores are not columnar OLAP #
Cassandra, HBase, and Bigtable are wide-column databases, not columnar OLAP. They are sparse maps of maps keyed by row key, optimised for OLTP-style key-value access at high write throughput, and the layout has nothing in common with columnar storage. DB-Engines groups them under its "Wide column stores" ranking, which readers searching for analytical engines often conflate with "columnar databases." The labels look similar, but the workloads do not overlap.
| System | Category | Workload |
|---|---|---|
| Apache Cassandra | Wide-column NoSQL | OLTP key-value with secondary indexes |
| Apache HBase | Wide-column NoSQL on HDFS | OLTP key-value on Hadoop |
| Google Bigtable | Wide-column NoSQL | OLTP key-value at extreme scale |