Postgres is an OLTP database, not an OLAP database. Its row-oriented storage and B-tree indexes are optimised for point lookups and small writes, not the wide aggregations OLAP workloads run. Postgres can handle modest analytics with certain extensions, but at production scale the standard pattern is Postgres for transactional writes plus a dedicated OLAP database for analytical reads, connected by change data capture. The 2024 Stack Overflow Developer Survey placed Postgres as the most-used database for the second year running, which is part of why the question keeps surfacing: teams reach for Postgres first, then ask whether they can stretch it.
TL;DR #
- Postgres is OLTP. Row-oriented
heapstorage, B-tree indexes, and MVCC are tuned for high-concurrency single-row reads and writes, not analytical scans. - The Postgres-for-analytics extensions (e.g., Citus, Timescale, pg_duckdb) widen the workload range without changing the underlying row-store engine. Each helps in a narrow envelope and breaks outside it.
- At terabyte scale, the production pattern is Postgres for writes plus a columnar OLAP engine for reads, replicated by change data capture.
- ClickHouse's managed Postgres CDC (ClickPipes for Postgres) keeps a near-real-time replica of a Postgres source inside ClickHouse Cloud so analytical queries don't run against (or block) the source database.
- Signs it's time to migrate analytical reads off Postgres include dashboard queries crossing 5 seconds, aggregations affecting OLTP tail latency, autovacuum pressure from large append-only tables, and analysts blocking app writes.
What Postgres is built for #
Postgres is built for transactional workloads: point reads, single-row writes, ACID transactions at high concurrency. Its heap storage format keeps rows together on disk, B-tree indexes target selective lookups, and MVCC isolates concurrent transactions without blocking readers. The combination delivers millisecond tail latency on OLTP queries.
The same design choices that make Postgres strong at OLTP make it weak at OLAP. An analytical query that aggregates one column across a billion rows on Postgres reads the full row payload for every row, because columns are interleaved on disk. A columnar engine reads only the column file the query touches and skips entire ranges via sparse indexes. On the public ClickBench benchmark, columnar engines complete the 42-query analytical workload in single-digit seconds on a single node where Postgres takes minutes to hours on the same hardware.
The broader OLTP-vs-OLAP comparison is in the dedicated OLTP vs OLAP write-up, which covers the storage layout, indexing, and concurrency tradeoffs in depth.
The Postgres-for-analytics extensions, honestly assessed #
The Postgres-for-analytics extensions genuinely widen the analytical envelope but don't change the underlying row-store engine. Each extension helps in a narrow workload niche and breaks outside it. Teams eventually hit the architectural ceiling in production.
The honest assessment matters because the marketing material rarely names the limits, and the limits are the part that decides whether an extension fits.
| Extension | What it does | Where it helps | Where it breaks |
|---|---|---|---|
| Citus (Microsoft) | Distributes Postgres across nodes by shard key; offers columnar table option | Multi-node OLTP scale-out; modest sharded analytics | Cross-shard joins on row-store columns; not a general-purpose OLAP engine |
| Timescale | Hypertables and continuous aggregates for time-series | Time-bucketed analytics over append-only sensor or telemetry data | General OLAP queries that aren't time-series-shaped; high-cardinality grouping |
| pg_duckdb | Embeds DuckDB inside Postgres for in-process columnar queries | Ad-hoc analytical SQL against Postgres tables on a single node | Multi-node analytics; concurrency at production scale; the analytical workload still runs inside the Postgres process competing with OLTP |
The architectural ceiling is the same in every case: the extension runs in the same Postgres process as the OLTP workload, on the same node hardware, with the same row-oriented storage as the foundation. That works at small to moderate scale. It doesn't scale to a workload where analytical reads need to run against multi-terabyte data without affecting OLTP tail latency.
When Postgres is enough, and when you've outgrown it #
Postgres handles simple analytical workloads up to roughly 100 GB of data, with latency tolerance in seconds, and infrequent query rates without trouble. Beyond that envelope, the row-store ceiling produces observable signals (dashboard latency, OLTP contention, autovacuum pressure) that mean it's time to move analytical reads off Postgres.
Stay on Postgres when:
- Total analytical data volume is under ~100 GB on a single node.
- Analytical queries run infrequently (handful per day, not per minute) and run for internal-only reporting.
- Latency tolerance is in seconds and there is no real-time dashboard requirement.
- Organisational simplicity (one database, one team, one operational surface) outweighs the performance ceiling.
Migrate analytical reads off Postgres when:
- Dashboard queries cross 5 seconds and the workload is sensitive to it.
- Analytical aggregations measurably affect OLTP tail latency or transaction throughput.
- Autovacuum pressure from large append-mostly tables (events, logs, telemetry) is degrading writes.
- Analyst queries block application writes through lock contention or shared-buffer eviction.
- Multiple BI tools or downstream consumers query the same Postgres instance and the contention is operational pain.
Each signal corresponds to an architectural limit, not a tuning problem. Adding indexes, vacuum tuning, or read replicas defers but doesn't resolve the underlying mismatch between row-oriented storage and analytical scan patterns.
How a Postgres → ClickHouse setup actually works #
A Postgres → ClickHouse setup keeps Postgres for transactional writes and replicates its data into ClickHouse for analytical reads, with change data capture closing the gap in seconds. Dashboard queries hit ClickHouse, not Postgres, which removes the contention that pushed the workload off Postgres in the first place.
A typical Postgres → ClickHouse setup looks like this:
- Source. A Postgres database serving the application's transactional writes. Logical replication is enabled on the relevant tables.
- CDC pipeline. A change data capture connector reads Postgres's write-ahead log and streams row-level changes to the destination. ClickPipes for Postgres is ClickHouse Cloud's managed connector; it handles initial snapshot, schema evolution, and incremental replication with sub-10-second latency end-to-end. Self-managed alternatives include Debezium and ClickHouse's open-source PeerDB integration.
- Destination. A ClickHouse cluster stores the replicated data in columnar
MergeTreetables. Analytical queries from BI tools, dashboards, and ad-hoc SQL hit ClickHouse, not Postgres.
The analytical workload runs at columnar latency without touching the Postgres source. Dashboard queries that took 30 seconds on Postgres return in under a second on ClickHouse on the same data, because the engine reads only the columns the query touches and aggregates them in vectorised batches.