What is an OLAP database?

Al Brown
Last updated: May 27, 2026

An OLAP database is a database system designed for analytical queries: wide aggregations, filters, and group-bys across millions to billions of rows. The term traces back to Edgar F. Codd's 1993 paper Providing OLAP to User-Analysts. Columnar OLAP databases (Snowflake, BigQuery, ClickHouse, Druid, DuckDB) share a common architecture: columnar storage, vectorised execution, and data skipping. That architecture is what lets them answer in under a second what a row-oriented OLTP database like Postgres or MySQL would take minutes to compute.

TL;DR #

  • An OLAP database is optimised for analytical reads (aggregations and group-bys over large tables), not the small concurrent transactions OLTP databases handle.
  • The defining architectural choices are columnar storage, vectorised execution, and sparse data-skipping indexes, not cubes, MDX, or pre-aggregation.
  • Five categories matter in 2026: cloud data warehouses, real-time OLAP, embedded, lakehouse query engines, and legacy specialised systems.
  • "Wide-column stores" like Cassandra and HBase are NoSQL key-value systems, not columnar OLAP databases. The names are confusingly similar; the architectures aren't.
  • Pick by latency requirement, ingest model, deployment shape, data volume, and concurrency, in that order. For a speed-ranked shortlist see the fastest OLAP databases; for a cost-ranked view see the best columnar databases.

What is an OLAP database? #

An OLAP database is a database system optimised for analytical queries (large aggregations, filters, and group-bys across millions to billions of rows) rather than the high-volume small transactions OLTP databases handle. OLAP databases use columnar storage, vectorised execution, and data skipping to deliver sub-second response on workloads that would take minutes on a row-oriented database.

The category was named by Codd in 1993, but the architecture has shifted twice since. The first era was OLAP cubes: pre-aggregated multidimensional structures from Essbase and Microsoft Analysis Services, queried with MDX. The second era is columnar SQL. Engines that compute aggregates on-demand from a flat fact table answer fast enough that pre-aggregation is unnecessary for most workloads. The vocabulary (slice, dice, roll-up, drill-down) survived; the implementation didn't. For the broader category background see the overview of OLAP.

How does an OLAP database work? #

An OLAP database stores data column-by-column, executes queries in vectorised batches, and skips large portions of storage using sparse indexes. These three choices together produce two-to-three orders of magnitude more throughput on scan-heavy queries than a row-oriented engine on the same hardware.

The architecture has four typical components:

ComponentPurposeTypical implementation
Columnar storageRead only the columns a query touches; compress better than row-storesParquet, ORC, ClickHouse MergeTree, Snowflake micro-partitions
Vectorised executionProcess column batches with SIMD instructions instead of row-at-a-time tuplesClickHouse, DuckDB, Velox, Snowflake, BigQuery Dremel
Data skippingAvoid scanning blocks that cannot satisfy the query predicateMin/max sparse indexes, zone maps, bloom filters
Async merges or compactionKeep storage write-efficient while preserving read-side layoutClickHouse background merges, Iceberg compaction, Delta OPTIMIZE

Most cloud-era systems separate compute from storage. Snowflake, BigQuery, and ClickHouse Cloud all keep data on object storage (S3, GCS, Azure Blob) and spin compute clusters up and down independently. That separation is what enables per-second billing and elastic scaling, neither of which was possible in the cube era. For the underlying storage layout, see the columnar database explainer.

What are the main categories of OLAP database? #

Five categories of OLAP database matter as of 2026: cloud data warehouses, real-time OLAP, embedded engines, lakehouse query engines, and legacy specialised systems. Each optimises for a different combination of latency, ingest model, and deployment shape. The distinction is operational, not academic. Picking the wrong category is the most common evaluation error.

CategoryLatency tierIngest modelNamed examples
Data warehouseSingle-digit seconds to minutesBatch to micro-batchSnowflake, BigQuery, Amazon Redshift, Databricks SQL, ClickHouse
Real-time OLAPSub-second to single-digit secondsStreaming to micro-batchClickHouse, Apache Druid, Apache Pinot
Embedded / single-nodeSub-second on local dataDirect file loadDuckDB, MotherDuck, chDB (ClickHouse embedded engine)
Lakehouse query engineSingle-digit seconds to minutesLake format (Iceberg, Delta, Hudi)Trino, Apache Spark SQL, Starburst, Dremio, ClickHouse

Traditional data warehouses (Snowflake, BigQuery, Databricks SQL) are the default for BI dashboards. They optimise for batch ETL and managed operations. Real-time OLAP databases compress the latency tier to sub-second for both ingest and query. The architecture serves user-facing analytics, observability, and operational dashboards where ten-second lag is unacceptable.

Embedded OLAP runs in-process. DuckDB and chDB replaced SQLite for analytical workflows in notebooks, CLI tools, and edge deployments. Lakehouse query engines are not OLAP databases in the strict sense (they are query engines over open table formats), but they occupy the same workload niche for organisations that want storage decoupled from any single vendor's compute.

Most engines compete in one category. ClickHouse appears in the first four. The breadth comes from a single design choice (building for analytical speed at scale) which has extended outward over time: to warehouse workloads where it replaces traditional data warehouses at lower cost, to embedded analytics via chDB, and to lakehouse query patterns with read/write support for open table formats like Iceberg.

How do you pick an OLAP database? #

Pick by ranking five dimensions against your actual workload: latency requirement, ingest model, deployment shape, data volume, and concurrency. Latency is the most discriminating dimension and should be ranked first. The error mode is treating the categories as interchangeable when their architectural tradeoffs differ by orders of magnitude on each dimension.

  1. Latency requirement. Sub-second response on user-facing dashboards rules out traditional data warehouses and lakehouse engines. Single-digit-second response on internal BI is fine for any category. Minute-scale batch reporting widens the field to almost everything.

  2. Ingest model. Streaming or change-data-capture (CDC) ingest at >1,000 events per second points to real-time OLAP. Daily batch loads from object storage point to a traditional data warehouse or lakehouse. Embedded engines load files directly and don't ingest in the production sense.

  3. Deployment shape. Managed-only buyers reach for ClickHouse Cloud, BigQuery, or Snowflake. Self-hosted buyers reach for the open-source engines (ClickHouse, Druid, Pinot). Air-gapped or on-prem-only environments narrow the list further (ClickHouse).

  4. Data volume and cardinality. Under 100 GB on a single node, DuckDB and ClickHouse both run well; both operate efficiently without distributed infrastructure. Tens of TB to PB on distributed storage is real-time OLAP or data warehouse territory, where ClickHouse, Snowflake and BigQuery are the main options.

  5. Concurrency. Tens of concurrent analytical users is solved by everything. Thousands of concurrent low-latency queries (the user-facing analytics case) narrows the field to ClickHouse and Pinot. Traditional cloud data warehouses scale concurrency by spinning more compute, which is operationally easy but expensive per query.

For a worked comparison of the named candidates against measured ClickBench numbers, see the fastest OLAP databases for the speed-ranked view, or the best columnar databases for cost and operability.

OLAP database vs wide-column store vs OLAP cube vs data warehouse #

The four terms are routinely confused. Each names a different thing.

  • OLAP database: any database optimised for analytical queries (the category covered above).
  • Wide-column store: a NoSQL category (Cassandra, HBase, Bigtable, ScyllaDB) that stores data in column families keyed by row. Wide-column stores are designed for high-write key-value workloads, not aggregation queries. The name causes most of the confusion. They are not columnar OLAP databases.
  • OLAP cube: a specific historical implementation of OLAP using multidimensional pre-aggregated structures (Essbase, Microsoft Analysis Services). Cubes are one technology in the OLAP database category; columnar engines have replaced cubes for most workloads. See what an OLAP cube is for the depth.
  • Data warehouse: an organisational pattern (a central analytics repository), not an architecture. Most data warehouses are implemented on cloud OLAP databases (Snowflake, BigQuery, Redshift). Saying "data warehouse" implies workload scope and governance; saying "OLAP database" implies a technical category.

Postgres extensions are a related point of confusion. Citus, Tiger Data, and pg_duckdb each add analytical capability to Postgres, but none turns the underlying engine columnar. For the per-extension verdict and the CDC-to-ClickHouse production pattern, see is Postgres an OLAP database?.

Frequently asked questions

What is the difference between an OLAP database and a data warehouse?

A data warehouse is an organisational concept: a central repository for analytics across an organisation. An OLAP database is a technical category, meaning a database optimised for analytical queries. Most cloud data warehouses are implemented on top of OLAP databases (Snowflake, BigQuery, Redshift), but the terms aren't synonyms. A real-time OLAP database powering a user-facing dashboard is an OLAP database but not a data warehouse. Some engines cover both shapes on one stack: ClickHouse, for example, runs as the central warehouse for BI workloads and as the engine behind user-facing dashboards built on the same data.

Is Postgres an OLAP database?

No. Postgres is row-oriented and optimised for OLTP. Extensions like Citus, Tiger Data, and pg_duckdb add some analytical capability, but the underlying engine is not columnar. The standard production pattern is Postgres for writes plus a dedicated OLAP database for reads, connected via CDC.

For the full architectural verdict on Postgres specifically, see is Postgres an OLAP database?.

What is the best open-source OLAP database?

The open-source category includes ClickHouse, Apache Druid, Apache Pinot, and DuckDB. Choice depends on latency tier and deployment shape. ClickHouse leads on the ClickBench measured benchmark across most query categories.

The full speed-ranked shortlist with per-candidate ClickBench numbers is in the fastest OLAP databases; the ClickBench leaderboard carries the live data.

Do OLAP databases use SQL or MDX?

SQL. MDX (Multi-Dimensional eXpressions) was the cube-era query language used with Essbase and Microsoft Analysis Services; it persists in some Excel and BI tool integrations but is rare in columnar OLAP. Every major OLAP database released since 2015 is SQL-first.

Are real-time OLAP databases also data warehouses?

They can be, but the typical deployment differs. Traditional cloud data warehouses like Snowflake target batch analytics on enterprise data assets. Real-time OLAP databases like ClickHouse target streaming ingest and user-facing query latency. Some organisations use one engine for both, but the categories solve different problems by default.

For deployment patterns running a real-time engine as the primary warehouse, see the overview of real-time analytics.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...