Skip to content

What is OLAP?

Al Brown
Last updated: May 27, 2026

OLAP (online analytical processing) is a category of database systems and workloads designed to answer multidimensional analytical questions across large volumes of data. OLAP systems are optimised for aggregation, filtering, and grouping queries over millions to billions of rows — the kind of work that powers business intelligence, dashboards, and ad-hoc data exploration in engines like Snowflake, BigQuery, and ClickHouse.

The term was coined by E.F. Codd in his 1993 white paper Providing OLAP to User-Analysts: An IT Mandate, which defined twelve rules for analytical systems and contrasted them with transactional (OLTP) processing. Three decades later the workload definition still holds, but the architecture has changed completely. The pre-aggregated cubes that defined the first two decades have largely given way to columnar engines that compute aggregations on demand.

TL;DR

  • OLAP (online analytical processing) workloads scan and aggregate millions to billions of rows; OLTP workloads read or write small numbers of rows at millisecond latency.
  • The historical taxonomy (MOLAP, ROLAP, HOLAP) describes how aggregations are stored: pre-built in cubes, computed on-demand from relational tables, or both.
  • Columnar OLAP engines run on column-oriented storage and vectorised execution rather than pre-aggregated cubes; query-time aggregation on raw fact tables is fast enough that cubes are largely a legacy pattern.
  • Examples in 2026 split across real-time engines (ClickHouse, Druid, Pinot) for sub-second analytics, traditional cloud warehouses (Snowflake, BigQuery, Redshift) for batch BI reporting, embedded analytics (DuckDB), and specialised systems (kdb+, QuestDB).
  • "OLAP database" and "wide-column store" are different categories. Cassandra, HBase, and Bigtable are wide-column row-stores, not columnar OLAP engines.

What is OLAP?

OLAP (online analytical processing) is a class of database workload defined by analytical queries that scan and aggregate large data volumes to answer business questions. It contrasts with OLTP (online transactional processing), which handles small, high-frequency reads and writes against current operational state.

The category covers both the workload pattern (wide aggregations, filtering, grouping, and joining of fact and dimension tables) and the systems built to serve it. A query like "monthly revenue by product and region for the last three years" is an OLAP query whether it runs on Excel against a 1990s cube or on ClickHouse against a billion-row fact table in 2026.

The term originated with E.F. Codd's 1993 paper for Arbor Software (the makers of Essbase), which laid out twelve rules covering multidimensional views, transparency, accessibility, consistent reporting performance, generic dimensionality, dynamic sparse matrix handling, multi-user support, unrestricted cross-dimensional operations, and unlimited dimensions and aggregation levels. The rules were aimed at the relational vendors of the era, and the contrast they drew (analytical workloads need different storage and execution than transactional ones) still defines the field.

The related vocabulary (drill-down, roll-up, slice, dice, pivot) predates the columnar shift and still defines how analysts talk about OLAP queries in 2026. The OLAP operations breakdown covers each one with SQL examples.

What does OLAP stand for, and what about MOLAP, ROLAP, HOLAP?

OLAP stands for online analytical processing. The three implementation models are MOLAP (multidimensional OLAP, pre-aggregated cubes), ROLAP (relational OLAP, on-demand SQL against relational tables), and HOLAP (hybrid, combining both). MOLAP was the default through the 1990s and 2000s; most OLAP databases shipping in 2026 are ROLAP-shaped with materialised views playing the role cubes used to.

ModelStorageQuery pathStrengthsTrade-offs
MOLAPPre-aggregated cubes (Essbase, Microsoft Analysis Services)MDX → cube lookupSub-second answers on pre-defined dimensionsHours-long builds; storage explodes with dimensions; rigid schema
ROLAPRelational tables (Snowflake, BigQuery, ClickHouse)SQL → on-demand aggregationAd-hoc queries; arbitrary dimensions; flexible schemaHigher per-query latency unless the engine is fast enough
HOLAPMix of bothCube for hot summaries, SQL for detailLegacy compromise inside vendor stacksOperational cost of running two systems

The "online" in OLAP is historical and means interactive rather than batch; the term was coined in 1993 to contrast with overnight reporting jobs, not internet connectivity. The taxonomy still appears in textbooks and certification exams (AWS SAA-C03, for one), which keeps terms like MOLAP, ROLAP, and HOLAP alive even though it has largely become obsolete in modern systems.

How does OLAP differ from OLTP?

OLTP is row-oriented, transactional, latency-sensitive at the millisecond level, and tuned for thousands of concurrent small writes. OLAP is columnar, analytical, latency-sensitive at the sub-second to second level, and tuned for wide aggregations over much larger volumes. The two workloads have opposite shapes, and the database designs that win one lose the other.

DimensionOLTPOLAP
StorageRowColumnar
Read patternFew columns, few rowsMany columns, many rows
Write patternSingle-row mutations, high QPSBulk inserts, append-mostly
LatencySub-50msSub-second to seconds
ConcurrencyThousands of users, simple queriesTens to hundreds, complex queries
SchemaNormalised 3NFStar / wide denormalised
ExamplesPostgres, MySQL, OracleClickHouse, Snowflake, BigQuery

Read OLTP vs OLAP for the full comparison of the two. OLTP and OLAP are complimentary technologies, many are unifying OLTP and OLAP under one architecture.

How OLAP shifted from cubes to columnar

Columnar OLAP engines run on column-oriented storage and vectorised query execution rather than pre-aggregated cubes. A columnar engine reads only the columns referenced by a query, processes data in SIMD-friendly batches, and skips ranges of rows using sparse indexes, which makes ad-hoc aggregation over a raw fact table competitive with cube lookups, without the cube build step.

The shift was incremental. Sybase IQ shipped a columnar engine in 1994. Vertica, built by C-Store's authors, launched commercially in 2007. Google released the Dremel paper underpinning BigQuery in 2010. ClickHouse was open-sourced in 2016. By the late 2010s the latencies that previously required cube pre-aggregation were achievable on the raw data, and the cube layer became friction more often than benefit.

What columnar OLAP engines added beyond the storage layout:

  • Vectorised execution processes column batches in CPU-cache-friendly chunks instead of row-by-row.
  • Data skipping uses sparse primary indexes and min/max statistics to skip entire granules without reading them.
  • Async background merges roll append-mostly writes into larger sorted parts, keeping inserts cheap.
  • Compute and storage are separated in Snowflake, BigQuery, and ClickHouse Cloud, so cluster size scales with workload rather than data volume.
  • Materialised views handle cube-like pre-aggregation when it's wanted, computed asynchronously and queried like any other table.

OLAP cubes are not entirely gone. Essbase, Microsoft Analysis Services, and Mondrian still run inside large enterprises, and MDX persists in Excel PowerPivot. But for greenfield analytical workloads in 2026 the answer is almost always a columnar database. The OLAP cube explainer covers the history, the mechanics, and the question of how dead the cube really is.

What are some examples of OLAP databases?

OLAP databases in 2026 fall into two broad groups. Real-time OLAP engines (ClickHouse, Apache Druid, Apache Pinot) deliver sub-second analytical latency on streaming and batch data — the workload behind user-facing dashboards, observability, and operational analytics. Traditional cloud data warehouses (Snowflake, Google BigQuery, Amazon Redshift, Databricks SQL) target batch BI reporting where minutes-to-seconds latency is acceptable. Embedded engines (DuckDB), specialised systems (kdb+, QuestDB, Vertica), and lakehouse query engines (Trino on Apache Iceberg) cover the remainder. For the full category breakdown including selection criteria, see what is an OLAP database?.

The latency gap between the two groups is large enough to define which workloads each can serve. ClickHouse completes the ClickBench 42-query analytical workload in single-digit seconds on a single node, where traditional cloud DWHs take tens of seconds to minutes on equivalent hardware. The fastest OLAP databases ranks every major candidate against the public benchmark with per-engine p50, p99, and cold-start numbers.

Wide-column stores (Apache Cassandra, Apache HBase, Google Bigtable) are not columnar OLAP databases. They are row-oriented at the storage level, with rows grouped by partition key and columns stored as key-value pairs within a row. They serve OLTP-shaped workloads with flexible schemas, not analytical aggregation.

When should you use OLAP?

Use an OLAP database when read queries are largely aggregates and GROUP BYs over large ranges of rows. Use OLTP when reads are point lookups and writes are high-QPS single-row mutations against current state.

Concrete signals that an OLAP system fits:

  • Dashboards or BI reports scan millions of rows and group by dimensions.
  • Ad-hoc analytics needs to answer arbitrary questions across the warehouse, not a fixed set defined in advance.
  • Event or time-series data arrives in bulk (Kafka, CDC, batch loads) and is queried by aggregation, not by primary-key lookup.
  • Per-query data volume exceeds what an OLTP database's row storage and B-tree indexes can scan in time.

The standard production pattern in 2026 is a split architecture. Use an OLTP database (typically Postgres) to handles writes, change data capture replicates to an OLAP database for reads, and BI tools query the OLAP side. For real-time variants of the same pattern, see real-time analytics.

Further reading

If you want to understandSee
Drill-down, roll-up, slice, dice, pivot, with SQL examplesOLAP operations
The history and mechanics of OLAP cubes, and whether they're deadWhat is an OLAP cube?
How OLTP and OLAP differ in detailOLTP vs OLAP
Architecture of a columnar OLAP database and how to pick oneWhat is an OLAP database?
Specific OLAP databases ranked by query speed on a public benchmarkThe fastest OLAP databases (ranked by ClickBench)
Whether Postgres counts as an OLAP database, and the Postgres → ClickHouse patternIs Postgres an OLAP database?
Running OLTP and OLAP in a single system (HTAP)Unifying OLTP and OLAP
The role of columnar storage in OLAP performanceWhat is a columnar database?
Real-time variants of OLAPWhat is real-time analytics?

Frequently asked questions

01

OLAP stands for online analytical processing. The term was coined by E.F. Codd in 1993 to distinguish analytical database workloads (multidimensional aggregations, drill-down reporting, ad-hoc exploration) from the transactional workloads (OLTP) the era's databases were designed for. The acronym names a workload class, not a specific product.

02

An OLAP cube is a multidimensional structure that pre-aggregates measures (revenue, units, count) across hierarchical dimensions (time, geography, product) so analytical queries can be answered by lookup rather than computation. Cubes were the standard OLAP implementation from the 1990s through the early 2010s; columnar engines now achieve similar latency by computing aggregations on demand without the build step.

03

No. OLAP is a category of analytical processing. A data warehouse is an infrastructure pattern. Data warehouses are built to serve OLAP workloads, but OLAP also runs on real-time databases, embedded engines, and semantic layers. The terms are not interchangeable.

04

Postgres is an OLTP database. Its row-oriented storage and B-tree indexes are optimised for point lookups and small writes, not the wide aggregations OLAP workloads run. Extensions like Citus, Timescale and pg_duckdb add limited analytical capability, but at production scale the standard pattern is Postgres for writes plus a dedicated OLAP database for reads, connected via change data capture.

05

OLAP databases in production today include ClickHouse, Apache Druid, and Apache Pinot for real-time OLAP; traditional cloud warehouses (Snowflake, Google BigQuery, Amazon Redshift, Databricks SQL) for batch BI reporting; DuckDB for embedded analytics; and Vertica and kdb+ in legacy and specialised deployments. A speed-ranked shortlist with per-candidate ClickBench numbers is in the fastest OLAP databases.

06

Yes. The term "OLAP" is less fashionable than it was twenty years ago, but the underlying workload pattern — complex analytical queries on large datasets — is more prevalent than ever. Modern columnar databases, cloud data warehouses, and real-time analytics platforms all serve OLAP workloads under different names.

07

Modern ones can. Traditional OLAP systems required batch loads every few hours or days. Real-time OLAP databases like ClickHouse ingest data continuously and make it queryable within seconds. This shift in architecture is why real-time analytics is now viable for applications that previously relied on batch reports.

08

A star schema organises data into a central fact table surrounded by dimension tables. It is a common logical model for OLAP workloads because it maps naturally to the dimensions-and-measures structure of analytical queries. Modern columnar databases don't strictly require star schemas but benefit from the clarity they provide.

09

Most modern OLAP databases support standard SQL. ClickHouse, Snowflake, BigQuery, Redshift, and DuckDB all use SQL as their primary query interface. This is a significant shift from older OLAP systems that used proprietary MDX (Multidimensional Expressions) or custom query languages.

10

A data lake stores raw, unstructured data (files, logs, images) cheaply at scale. An OLAP database stores structured data optimised for fast analytical queries. They often work together: raw data lands in a data lake, gets transformed, and is loaded into an OLAP system for querying. Open table formats like Apache Iceberg are blurring this boundary by enabling analytical query engines to read data lake files directly.

11

It depends on the deployment model. Open-source self-hosted options like ClickHouse have zero licence cost but require operational investment. Cloud managed services (ClickHouse Cloud, Snowflake, BigQuery) charge based on compute and storage usage. Total cost of ownership varies enormously depending on data volume, query patterns, and operational capabilities. For a detailed breakdown, see how the 5 major cloud data warehouses really bill you and how they compare on cost-performance.

12

Yes. ClickHouse runs as a single binary and is straightforward to install on any machine. For lighter usage, clickhouse-local runs ClickHouse queries without starting a server process — just point it at a file and query. chDB embeds the ClickHouse engine inside Python for Pandas-like analytical workflows. clickhousectl is a new CLI that makes local development even simpler. DuckDB also targets the embedded single-machine use case. OLAP no longer requires enterprise infrastructure to get started.

13

OLAP is the processing layer. BI is the presentation layer. BI tools (Tableau, Looker, Power BI, Grafana) visualise data and build dashboards. They connect to OLAP databases to run the underlying queries. OLAP provides the speed and structure. BI provides the interface.


Share this resource

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

More like this

ClickHouse concurrency: how to size for user-facing analytics

Manveer Chawla • Last updated: Jul 2, 2026

How to size ClickHouse for high-concurrency, user-facing analytics: turn active users into query load, benchmark under production-like conditions, and configure per-query limits, admission controls, workload scheduling, and replicas.

Continue reading ->

When to denormalize, when to join: A ClickHouse guide

Manveer Chawla • Last updated: Jun 26, 2026

Denormalization and normalization are both valid analytical data-modeling strategies. A decision framework for choosing where to denormalize, where to join, and which ClickHouse primitives bridge the gap.

Continue reading ->

Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.

Continue reading ->