SaaS applications rely on Postgres as the source of truth for user, product, and transactional data. But exposing that data back to users through real-time, in-product dashboards creates a problem: analytical queries frequently grind your transactional database to a halt.
Customer-facing dashboards aren't like internal BI reports. They live inside your product, where every active user means another concurrent query. Your latency budget here is tight. Web performance research shows that users perceive responses under 100 milliseconds as instantaneous. Slower than that starts eroding the experience; anything in seconds feels broken.
This guide walks through a two-stage approach for scaling Postgres-backed analytics. We'll cover how to push native Postgres to its limits using materialized views and advanced indexing, then define the exact triggers that tell you when to move on. From there, we'll detail how to implement a real-time OLAP architecture that eliminates the operational headache of DIY data pipelines. The goal: sub-100 ms dashboards without compromising your primary database.
TL;DR #
- Customer-facing analytics is a different workload from BI: it needs sub-100 ms p95 on analytical queries, high concurrency, freshness within the SLA tier you commit to customers, strict tenant isolation, and predictable cost.
- Stage 1 (Postgres-only): use pre-aggregated tables/materialized views, composite indexes starting with
tenant_id, careful RLS, and BRIN for time-series. - Graduate when 2+ breaking points hit: buffer cache churn, million-row scans per request, high-cardinality index bloat, tighter freshness SLAs, or ad-hoc filtering needs.
- Stage 2 (Postgres + CDC + real-time OLAP): stream changes via CDC into a columnar OLAP database (e.g., ClickHouse) and serve dashboards from OLAP, not replicas or warehouses.
- Avoid DIY CDC outages: replication slots/WAL retention can fill disks; prefer managed CDC connectors.
What customer-facing analytics requires (eight non-negotiables) #
Most database performance escalations happen because teams treat all data retrieval as one big "analytics" workload. When engineers complain that Postgres can't handle their analytics anymore, they're usually lumping together four completely different workloads.
OLTP handles high-throughput, row-level reads and writes powering your core application. Internal BI runs heavy aggregations for executives and analysts, where queries can safely take minutes. Ad-hoc data science is unpredictable, exploratory work running in notebooks. Customer-facing analytics is the in-product dashboards and reporting exposed directly to your end users.
Customer-facing analytics is by far the strictest of these four. It demands sub-100 ms latency like a hot OLTP query path combined with complex aggregations like BI. It requires strict tenant isolation to prevent data leaks. It must support concurrency that scales linearly with your active user base.
Because of this profile, any solution needs to meet eight non-negotiable requirements. Use these as your benchmark for deciding when to optimize Postgres and when to move to a dedicated OLAP database.
1. Sub-100 ms p95 query latency #
For an API response that feels instant, your data store needs to return results in under 100 milliseconds at p95 on analytical queries. If your database takes 500 milliseconds to aggregate data, network overhead and front-end rendering will push total latency well past what feels sluggish.
2. Linear concurrency scaling #
In an internal BI tool, fifty concurrent queries is heavy load. In a customer-facing dashboard, fifty concurrent queries is a quiet Tuesday morning. Your data layer needs to handle hundreds or thousands of simultaneous requests without exhausting connection pools or queueing queries behind each other.
3. Tunable freshness per dashboard #
Different features need different freshness guarantees. A monthly billing rollup tolerates a 30-minute delay. A live marketplace inventory dashboard or security audit log needs seconds-level freshness. Your architecture needs to support streaming ingestion for those strict SLAs.
Pick the freshness tier intentionally per dashboard, not as a single global setting. Each tier maps to a different architecture:
| Freshness tier | Recommended architecture |
|---|---|
| 5 to 30 minutes | Scheduled rollup jobs against Postgres summary tables; standard cron or workers |
| 1 to 5 minutes | Incremental rollup workers; hot summary tables refreshed per write batch |
| 1 to 10 seconds | CDC stream from Postgres into a real-time OLAP database like ClickHouse |
Most B2B SaaS customer-facing dashboards live in the 1 to 10 second tier. Tighter than that is rare and usually only matters for fraud signals or trading-style tooling.
The looser tiers are not a free pass. As data volume grows, the rollup jobs themselves become expensive: minute-to-hour-long aggregations that consume primary database resources and contend with OLTP. The granularity of rollups you can afford shrinks as the underlying data grows. Polymarket hit this even though their customer-facing dashboards did not need second-level freshness.
4. Strict tenant isolation #
B2B SaaS applications are inherently multi-tenant. Every analytical query must be strictly bounded by a tenant identifier. Whether you implement this through API-layer authorization or database-level Row Level Security, cross-tenant data leakage is catastrophic. Your architecture must prevent it by design.
5. Pre-aggregations and materialized views #
You can't compute heavy metrics from raw event data on every page load. Your system needs to pre-aggregate data into summary tables or materialized views, absorbing compute cost at ingestion time rather than query time.
6. High-cardinality column support #
SaaS analytics frequently require filtering and grouping by high-cardinality dimensions: user IDs, API keys, session tokens, unique SKU numbers. Your database needs to index and query these columns efficiently without suffering from massive index bloat or degraded scan performance.
7. Time-series-friendly storage #
Dashboard queries are almost exclusively time-windowed: "Show me API requests over the last 7 days grouped by hour." Your storage engine needs to be optimized for time-series access patterns, physically ordering data by time so the query engine can skip irrelevant date ranges.
8. Predictable cost scaling #
As your user base grows, the cost of serving analytical queries needs to scale predictably. Architectures that couple storage and compute too tightly, or price based on scanned data volume per query, will produce explosive cloud bills as product adoption increases.
A common failure mode is running raw analytical queries directly against your OLTP tables. Complex aggregations over the live transactional schema scan large volumes of data and evict hot operational data from Postgres shared buffers. As cache hit ratio drops, your transactional application slows down, lock contention spikes, and dashboards become unusable.
The pattern that does work for early-stage Postgres analytics is pre-aggregation: absorb the compute cost at ingestion time into summary tables, and serve dashboards from those rather than the OLTP schema. That is exactly what Stage 1 is.
Stage 1: run customer-facing analytics on Postgres only #
Most B2B SaaS startups should start their analytics journey entirely within Postgres. The operational benefits of a small surface area (a single database, backup strategy, and security boundary) are worth more than the performance ceiling at this stage.
The sweet spot for Stage 1: an early-stage product, moderate data volumes, and dashboards that can tolerate one to five minute data freshness (matching the rollup-tier of the freshness SLA matrix above).
How Stage 1 works #
Your application writes directly to normalized OLTP tables. Rather than querying these tables for dashboards, you implement scheduled or incremental background jobs to build pre-aggregated analytics tables. A cron job might aggregate raw user events into an org_daily_metrics table. Your customer-facing API then reads strictly from these pre-aggregated tables, applying tenant filters for data isolation.
Postgres tactics that extend Stage 1 #
To maximize Postgres for this workload, you need several specific database tactics.
Composite indexes starting with the tenant identifier, like (org_id, timestamp), are mandatory. They ensure queries only scan data belonging to the requesting customer.
If you implement Row Level Security (RLS) for tenant isolation, design your policies carefully. Naive RLS policies that join to permissions tables add tens to hundreds of milliseconds per query. Optimized RLS strategies use pre-calculated session variables to evaluate permissions, reducing overhead to under a millisecond.
Materialized views offer a native way to pre-compute complex aggregations. But refreshing a standard materialized view acquires an exclusive lock, blocking all reads. To avoid this, use REFRESH MATERIALIZED VIEW CONCURRENTLY. This allows concurrent reads, but requires the view to have at least one unique index.
For large time-series event tables, traditional B-tree indexes become massive and inefficient. Block Range Indexes (BRIN) on timestamp columns can reduce index sizes by orders of magnitude. A BRIN index stores minimum and maximum values for physical block ranges, letting the query planner skip irrelevant blocks entirely during time-windowed queries.
When Stage 1 breaks #
Even with these optimizations, Stage 1 has a ceiling. Postgres is row-oriented, built for transactional integrity rather than analytical scans.
Define explicit triggers that tell you when to graduate from Stage 1 before your application falls over.
Technical breaking points:
- Buffer cache churn: Dashboard queries scan so much data they flush hot OLTP data from memory, causing production writes to slow down.
- Row scan volume: Your aggregations require scanning millions of rows per request, making it impossible to stay within 100 ms latency even with heavy indexing.
- High cardinality failure: Analytics require filtering on highly cardinal dimensions like API keys or individual SKUs, leading to massive index bloat and degraded B-tree performance.
Business breaking points:
- Freshness demands: Customers start demanding sub-30-second data freshness at high volumes, making incremental rollups and concurrent materialized view refreshes too slow.
- Ad-hoc filtering: Users need to dynamically filter dashboards by custom attributes, breaking the utility of pre-calculated summary tables.
The rule is simple: when any two of these breaking points become true, you've outgrown native Postgres. At that point, evaluate offloading in Stage 2.
Stage 2: offload analytics with CDC to a real-time OLAP database #
When Postgres can no longer serve both high-throughput transactions and complex analytical reads, you need to physically separate the workloads.
How CDC-based offloading works #
Your application continues writing to Postgres for OLTP operations. Change Data Capture (CDC) streams committed row-level changes into a dedicated, real-time OLAP database. Your customer-facing dashboards then read from the OLAP database, enforcing tenant_id filters at the API layer.
This separation lets Postgres dedicate its memory and CPU entirely to serving the application, while the OLAP database handles analytical aggregations. The choice of OLAP database matters: not every analytical database is built for in-product latency.
Why read replicas and cloud warehouses are the wrong fit for in-product dashboards #
| Requirement | Postgres read replica | Cloud data warehouse (Snowflake/BigQuery) | Real-time OLAP (ClickHouse) |
|---|---|---|---|
| Sub-100 ms p95 latency | Limited (row-store scans cap throughput) | Designed for batch BI (seconds to minutes) | Yes (columnar store, sparse indexes) |
| Horizontal concurrency | Limited (connection pool ceilings) | Costly under high concurrency (queueing, scaling spend) | Yes (scales linearly with nodes) |
| Freshness SLA | Seconds (physical replication) | Minutes to hours (batch-oriented) | Seconds (CDC streaming)Most B2B SaaS customer-facing dashboards live in the 1 to 10 second tie |
| High-cardinality support | Limited (index bloat on row-store) | Yes (brute-force scan) | Yes (designed for high-cardinality columns) |
| Predictable cost scaling | Unpredictable for analytics (Postgres is built for OLTP; analytical scans do not scale linearly) | Unpredictable for customer-facing concurrency | Predictable (engine built for analytics; scales linearly with workload) |
A fourth option, headless analytics SaaS, sits on top of one of these stores rather than next to them. It is covered separately below.
Postgres read replicas share the row-oriented limitations of the primary. They don't compress data effectively for analytics, and they still require heavy B-tree indexes that bloat memory. They make sense for offloading read-only traffic from the primary, not for high-concurrency analytical reads.
Cloud data warehouses like Snowflake and BigQuery are traditional, batch-oriented engines built for internal BI and complex long-running queries. They are excellent at that workload.
They were not designed for sub-100 ms in-product latency. Routing user-facing dashboards through a cloud data warehouse routinely produces queries taking several seconds, blowing past the 100 ms budget.
Concurrency is also expensive. Cloud data warehouses are priced for batch jobs rather than customer-facing traffic: consumption-based compute, minimum-billed durations on resume, and horizontal scaling under concurrency combine to make bills hard to predict.
A real-time OLAP database like ClickHouse or Apache Pinot is built for this profile. These systems are columnar, reading only the specific columns requested. They use sparse indexes to skip vast amounts of data without B-tree memory overhead, and they compress data aggressively by grouping similar values together in column-oriented storage, which both reduces storage cost and lets queries scan less data per request.
Where headless analytics SaaS fits #
Headless analytics platforms like Cube, Hex embedded, Preset embedded, and GoodData sit on top of an OLAP database, not in place of one. They provide a semantic layer, an API surface, query caching, and an embed-friendly UI toolkit. That accelerates the front-end build, but it does not change the latency, concurrency, or freshness profile of the underlying store. Pair a headless analytics layer with a real-time OLAP database when you want a faster ship path on the UI side; do not treat it as a substitute for the OLAP database itself.
The operational risks of DIY CDC (replication slots and WAL growth) #
Moving to this architecture traditionally introduces a severe ETL Tax. Operating logical replication pipelines with tools like Debezium and Kafka turns a small engineering group into a distributed systems maintenance team.
The primary risk centers on Postgres replication slots.
By default, Postgres uses max_slot_wal_keep_size = -1, which lets replication slots retain unlimited Write-Ahead Log (WAL) files if a consumer falls behind or goes offline.
If your self-hosted CDC consumer crashes silently on Friday evening, Postgres will dutifully retain every transaction log generated over the weekend. By Monday morning, hundreds of gigabytes of WAL files will have filled your primary disk, triggering emergency read-only mode and taking your entire SaaS application offline.
Add schema drift and replication slot lag management, and the operational cost of DIY CDC quickly eclipses the value of the analytics dashboard.
CDC operations are only one line item in the ETL Tax: pipeline engineering hours, warehouse compute, data staleness, schema drift incidents, vendor sprawl across DMS or Kinesis or Fivetran plus the warehouse plus monitoring. Hidden costs routinely run 2x to 4x the license fees. The right answer is not a better DIY pipeline; it is removing the pipeline from your ops surface entirely.
How managed CDC connectors reduce risk #
The solution is a managed, integrated CDC connector. ClickHouse Managed Postgres is vanilla Postgres with open-source ClickHouse integration baked in. ClickPipes, powered by PeerDB , is a single managed service that streams committed Postgres changes directly to ClickHouse, replacing what teams would otherwise build with Debezium, Kafka, and Kafka Connect.
This turns a complex, high-risk data engineering problem into a simple configuration step. The platform handles WAL management, replication slot monitoring, and schema synchronization automatically, connecting OLTP source to OLAP destination without intermediate infrastructure.
A unified query layer with pg_clickhouse #
CDC moves the data. The second half of the integration story is how you query it. The pg_clickhouse extension is an Apache 2.0 Postgres foreign data wrapper that exposes ClickHouse tables as foreign tables inside Postgres, with full pushdown of filters, GROUP BY, ORDER BY, HAVING, and SEMI JOINs to ClickHouse's columnar engine.
`pg_clickhouse` benefits every query path, customer-facing dashboards included. The API-layer pattern stays exactly as before: bounded endpoints, tenant_id enforced server-side, queries assembled by your service. What changes is the engine underneath. Queries against ClickHouse foreign tables get pushed down to the columnar engine automatically, so existing API code, ORMs, internal dashboards, and cron jobs all get columnar speedups without any rewrites or new clients.
How to model CDC data in ClickHouse #
Once data lands in ClickHouse, model it for the access pattern rather than copying the normalized Postgres schema unchanged.
Denormalization into wide event tables is a strong default for the hottest customer-facing dashboards because it gives you the lowest possible query latency on a known access pattern.
JOINs are also a first-class capability in ClickHouse, with all standard SQL join types (INNER, LEFT, RIGHT, FULL, CROSS, plus SEMI, ANTI, and ASOF), multiple join algorithms (hash, parallel hash, grace hash, full sorting merge, partial merge, direct), automatic optimization with statistics, and continued performance gains in 2025 and 2026. Use JOINs where they keep your schema simpler and your data easier to manage; reach for denormalization where you need the absolute floor on query latency.
For high-volume CDC and upsert flows, ClickHouse provides the ReplacingMergeTree table engine, which identifies rows with the same primary key and keeps the latest version. Background merges deduplicate asynchronously, and the FINAL keyword in SELECT guarantees deduplicated results at query time. FINAL has been significantly optimized for production use, so reach for it whenever a dashboard query needs immediate consistency.
Outside CDC, ClickHouse also supports standard SQL UPDATE and DELETE for individual or scattered row changes, and ALTER TABLE operations or partition drops for bulk data management. Pick the mechanism that matches the shape of the change rather than treating any one engine as the only option.
Where the source data is naturally immutable (event logs, request logs, telemetry), favor an append-only model. It keeps the schema simple and lets the merge process do less work.
Customer-facing analytics patterns (three real SaaS examples) #
Here's how this maturity journey maps to real-world SaaS environments. Three distinct business models, each handling the transition from Stage 1 Postgres constraints to Stage 2 real-time OLAP.
This path is well-trodden. Polymarket scaled customer-facing prediction-market data by pairing Postgres with ClickHouse.
Pattern 1: product usage dashboards in SaaS #
A project management SaaS provides workspace admins with a dashboard tracking daily active sessions and feature adoption. In Stage 1, the team relies on an org_daily_metrics rollup table in Postgres, refreshed every minute via a background worker.
As the user base crosses millions of daily events, the rollup window starts overrunning and the materialized-view refresh begins locking the primary database during peak write hours. Customers also start asking for sub-30-second freshness on the live-session counter. They graduate to Stage 2, streaming raw application events via CDC directly into a wide event table in ClickHouse.
Dashboards now query ClickHouse directly. Workspace admins can filter adoption metrics dynamically by user role and date range, with queries returning inside the dashboard's latency budget.
Pattern 2: marketplace seller dashboards #
An e-commerce marketplace provides merchants with a live view of gross merchandise value (GMV) and pending orders. In Stage 1, the team rolls up order metrics into per-merchant summary tables in Postgres, refreshed every couple of minutes by a background worker. That serves the long tail of small merchants comfortably, but power sellers with thousands of daily transactions cannot tolerate minute-scale staleness during a launch or a flash sale.
They graduate to Stage 2 by implementing CDC to stream the order event log into ClickHouse. Using AggregatingMergeTree, they maintain real-time, incrementally updated materialized views of merchant GMV.
The seller dashboard refreshes on a 30-second interval. By the time it does, the latest orders are already in ClickHouse, and queries return inside the dashboard's latency budget even during peak ordering windows.
Pattern 3: developer-facing usage analytics #
An API-as-a-Service platform needs to expose detailed usage analytics so developers can monitor latency, error rates, and cost forecasts. In Stage 1, the platform runs daily rollups grouped by api_key_id.
As developers demand real-time visibility into active incidents, daily rollups become obsolete. The platform starts streaming raw API request logs directly into ClickHouse, where months of request data sit efficiently behind sparse indexes and columnar compression.
The developer dashboard now lets users drill into specific error codes over custom time windows, with queries returning inside the dashboard's latency budget.
Anti-patterns that break customer-facing analytics #
Scaling customer-facing analytics means avoiding several destructive architectural traps that engineering teams frequently fall into.
Anti-pattern: running OLAP queries on production Postgres #
This is the most common failure mode. Allowing internal BI tools or heavy customer-facing dashboards to execute large table scans on your primary Postgres database flushes your shared_buffers. This evicts hot transactional data from memory, forcing your application to perform slow disk reads for routine operations. Your OLTP path chokes.
Anti-pattern: hand-rolled CDC pipelines without a data platform team #
Treating CDC infrastructure as a side project is a recipe for production outages. Unmonitored Kafka clusters, schema drift breaking downstream schemas, replication slot lag causing WAL disk exhaustion. These will consume your engineering team's cycles. If you can't staff a dedicated data platform team, rely on managed CDC connectors.
Anti-pattern: letting the front end send raw SQL #
Allowing the client application to generate and send raw SQL strings to your API is a security and maintenance hazard. Raw SQL bypasses semantic layers, makes query optimization impossible, and creates severe SQL injection risks.
The right pattern is the opposite. Build stable, bounded endpoints (like /api/analytics/overview) that securely resolve tenant_id from the authentication token, enforce strict cardinality limits and date range boundaries, and accept only semantic parameters from the client. The API assembles optimized database queries server-side so malicious or buggy client code cannot issue unbounded analytical queries against your database.
Anti-pattern: treating materialized views as the long-term answer at scale #
Postgres materialized views are useful for Stage 1 freshness on the order of tens of seconds. They are not a permanent customer-facing analytics architecture. As event volume grows, refresh windows lengthen, REFRESH MATERIALIZED VIEW CONCURRENTLY increasingly competes with primary writes for I/O, and the view's freshness SLA falls behind what customers see. Materialized views buy you headroom inside Stage 1; they do not let you skip the graduation to Stage 2.
Anti-pattern: row-level security as the only tenant isolation mechanism #
RLS is a useful safety net, but treating it as the sole isolation boundary couples tenant-scoped query performance to RLS policy execution. Naive RLS policies join to permissions tables and add tens to hundreds of milliseconds per query. Strong tenant isolation enforces tenant_id at the API layer, partitions or indexes data on tenant_id, and uses RLS as a last-line defense rather than the primary control.
Anti-pattern: measuring success on internal team load times #
Internal engineers running dashboards from a fast laptop on a fast network are a poor proxy for the customer experience. A SaaS customer's dashboard might be opened on a coffee-shop connection, halfway around the world, against a tenant with millions of events. Define your latency SLA on customer-side p95 and p99 over a representative population, not on internal team perception. Architectures that look fine internally regularly fail in production for the customers who matter most.
Build order: how to ship customer-facing analytics in six steps #
The architecture only works if you build it in the right order. Skipping a step routinely costs a quarter of engineering time recovering from a production fire.
- Define dashboard metrics explicitly. Resist "let users query anything" as a v1 goal. Pick the 8 to 15 metrics customers actually need, document the dimensions and date ranges they will filter on, and treat that document as the contract your API and storage will enforce.
- Ship Stage 1 with pre-aggregated summary tables in Postgres and indexed lookups. Use composite indexes that lead with
tenant_id. AddREFRESH MATERIALIZED VIEW CONCURRENTLYonly after verifying the unique-index requirement. Use BRIN indexes on time columns for large append-only event tables. - Add an API-layer cache keyed on
(tenant_id, metric, date_range)with a 5 to 30 second TTL. This single layer absorbs the bulk of dashboard concurrency before any database tuning becomes necessary, and it is straightforward to add or remove without touching the storage architecture. - Graduate to Stage 2 when two or more Stage 1 triggers fire. Stream Postgres changes into a real-time OLAP database via CDC. With ClickHouse Managed Postgres, you don't migrate from Postgres; you activate ClickHouse alongside it from the same console. Your application keeps querying its Postgres connection string. pg_clickhouse transparently routes analytical queries to ClickHouse foreign tables, so the same API endpoints, ORMs, and dashboards get columnar speedups without rewrites or new clients.
- Expose freshness in the UI and alert on freshness lag p95 the way you alert on query latency. A "Last updated 8 seconds ago" indicator next to each chart is enough to make the SLA visible to customers.
This order keeps the Stage 1 surface area small, defers infrastructure complexity until the workload actually requires it, and turns the Stage 2 graduation into a known, planned move instead of an emergency.
Conclusion: when to stay on Postgres or move to real-time OLAP #
Real-time customer-facing analytics has a strict baseline. An architecture that cannot satisfy all five of these is not yet a customer-facing analytics architecture:
- Analytical queries return in under 100 ms at p95.
- API responses complete in under 300 ms at p95.
- Data is fresh within the SLA tier you committed to your customers.
- Tenant data is securely isolated.
- The workload imposes zero load on your primary OLTP database.
If you're hitting Postgres scaling limits now, moving to real-time OLAP doesn't require a re-platforming project. ClickHouse Managed Postgres is vanilla Postgres with open-source ClickHouse integration: the OLTP database, the columnar OLAP database, managed CDC via ClickPipes, and a unified query layer via pg_clickhouse, behind one console. You don't migrate from Postgres. You activate ClickHouse next to it, and the Stage 1 to Stage 2 graduation becomes a configuration change rather than a quarter of pipeline engineering.