The 8 best Datadog alternatives for observability at scale in 2026

Aditya Somani
Last updated: Dec 8, 2025

TL;DR

  • The problem: Datadog's unpredictable and high costs are not just a pricing issue. They are a symptom of an architecture that scales with high-cardinality data by passing significant costs on to the user. This leads to expensive custom metrics, dual charges for ingestion and querying, and penalties for dynamic infrastructure.
  • The solution is architectural: The most effective alternative isn't just a cheaper tool but one built on a modern columnar data engine designed for analytics at scale. Solutions like ClickStack, built on the ClickHouse columnar database, are designed specifically for the speed and scale that modern observability demands.
  • ClickStack's advantage: By unifying logs, metrics, and traces in a single high-performance datastore, ClickStack delivers sub-second queries on petabytes of data, high data compression (14x), and a simple, predictable pricing model. It solves the high-cardinality problem natively, offering massive cost savings and performance gains over traditional platforms.
  • This guide: We compare eight alternatives, focusing on their underlying architecture, pricing models, and ability to handle high-cardinality data, helping you choose a solution that scales without surprise bills.

It’s a familiar story for many DevOps teams: the monthly observability bill is spiraling out of control. When a Hacker News thread about a single company's $65 million Datadog bill went viral, it unleashed a wave of similar complaints about complex, unpredictable billing. But these high costs are symptoms of a deeper architectural problem. Many legacy platforms were not built for the reality of modern systems, which generate huge volumes of high-cardinality data from ephemeral infrastructure like Kubernetes. This mismatch leads to punitive fees for custom metrics and dual charges for ingestion and querying. This guide provides a technically-grounded comparison of alternatives, focusing on the core architecture that solves these fundamental problems of performance and cost at scale.

Why Datadog's pricing model struggles at scale #

Datadog is a powerful platform, but its pricing model can lead to unpredictable and escalating costs, especially for organizations operating at scale. These financial pressures are not arbitrary. They are symptoms of an architecture that, while capable, becomes prohibitively expensive when applied to the high-volume, high-cardinality data that defines modern observability.

Here’s a breakdown of the key challenges:

ChallengeDescriptionArchitectural cause
The high-cardinality taxExploding costs for custom metrics when adding tags with many unique values (e.g., container_id, customer_id).Traditional indexing technology struggles to efficiently manage and query a vast number of unique metric time series.
The ingestion vs. query penaltyPaying once to ingest logs and a second, higher fee to index them for querying, forcing teams to limit what they analyze.Data is stored in a way that is not optimized for fast, ad-hoc analytical queries without expensive, upfront indexing.
The per-host problemCosts for APM and infrastructure are tied to the number of hosts, penalizing modern, ephemeral architectures like Kubernetes.A billing model tied to a legacy unit (hosts) that fails to represent modern, container-based workloads where host counts are dynamic and often irrelevant.

The high-cardinality tax #

A primary driver of surprise bills is the "custom metric tax." Datadog's billing for custom metrics is based on cardinality: the number of unique combinations of a metric name and its tags (e.g., host, customer_id, container_id). Adding a single tag with many unique values, a common practice when using frameworks like OpenTelemetry, can cause the volume of billable metric time series to explode. At scale, custom metrics can account for over half of a total Datadog bill. This occurs because traditional monitoring systems often rely on indexing technologies that struggle to efficiently manage and query a vast and ever-growing number of unique label combinations.

The ingestion vs. query penalty #

For log management, Datadog uses a dual-cost model that charges teams twice for the same data. You first pay an ingestion fee simply to collect and process the logs (e.g., $0.10 per GB). To make those logs searchable and useful for real-time debugging, you must pay a second, much higher fee to index them for analysis (e.g., $1.70 per million log events). This creates a difficult choice between budget and visibility, forcing engineering teams to be highly selective about what data they can afford to analyze, undermining the goal of comprehensive observability.

The per-host problem #

Finally, tying core service costs for infrastructure monitoring and APM to the number of hosts penalizes modern, dynamic architectures. In environments like Kubernetes, where containers and nodes are ephemeral and scale on demand, the "per-host" metric is a fluctuating, unpredictable multiplier for costs.

This model forces teams into a highly undesirable trade-off: either compromise their architecture to minimize billable hosts — prioritizing cost over resilience — or accept "observability gaps" by leaving services unmonitored to stay within budget.

The 8 best Datadog alternatives for scalable observability #

This table provides a quick summary of the 8 alternatives discussed below, helping you identify the best potential fits based on your primary needs and technical preferences.

AlternativeCore architectureBest for high cardinality?Primary strengthBest for
ClickStackUnified columnar databaseYes, nativelyExcellent cost & performance at scaleTeams needing to solve high-cardinality issues and drastically cut observability costs.
New RelicProprietary SaaSNo, can be costlyDeep APM insightsEnterprises needing mature, all-in-one APM without strict budget constraints.
DynatraceProprietary SaaS / OneAgentNo, can be costlyAI-powered automationLarge enterprises needing powerful, automated root-cause analysis in complex environments.
Grafana StackSiloed (Loki, Mimir, Tempo)Yes, but complexOpen source & customizableTeams invested in the Prometheus ecosystem with resources to manage operational complexity.
SplunkSearch indexNo, struggles at scaleLog management & SIEMEnterprises focused on deep log analysis and security, with significant budgets.
ChronosphereCloud-native SaaS (M3)Yes, via control planeTraffic shaping & Cost controlLarge cloud-native organizations struggling with Prometheus scale and costs.
SignozOpen-source (ClickHouse)Yes, nativelyUnified OSS experienceTeams wanting an open-source, self-hostable Datadog alternative.
Dash0Serverless OTel Data LakeYesOTel-native & ServerlessTeams fully committed to OpenTelemetry seeking zero ops.

1. ClickStack (on ClickHouse) #

For teams grappling with the architectural limitations and prohibitive costs of Datadog, ClickStack offers a fundamentally different approach. ClickStack treats observability as a data problem—one best solved by combining a highly efficient data engine with a universal data standard. Built on the open-source, columnar database ClickHouse and the industry-standard OpenTelemetry framework, it is an end-to-end solution comprised of high-performance storage and a purpose-built UI from HyperDX. This stack is designed from the ground up for high performance, cost-efficiency, and total data portability.

The architecture centers on a unified datastore for logs, metrics, and traces. By strictly adhering to the OpenTelemetry data model, ClickStack eliminates data silos and stores telemetry as 'wide events' (structured rows containing all context). This OTel-native structure is uniquely suited for ClickHouse’s columnar format, enabling blazing-fast aggregations. Because ClickHouse combines vectorized query execution with multi-core parallel processing, it can process these structured OTel spans with sub-second query times, even on petabytes of data.

Crucially, ClickStack represents a shift away from proprietary agents. As a deeply OpenTelemetry-native platform, it decouples your instrumentation from your backend. Unlike proprietary vendors that lock your code into their specific libraries, ClickStack leverages OTel to ensure you own your data generation. By supporting standard query languages like SQL alongside standard OTel semantic conventions, ClickStack empowers teams to practice "Observability Science": performing deep, ad-hoc analysis to correlate system behavior with business outcomes in a way that is impossible with proprietary DSLs.

Key features:

  • Unified columnar datastore: Ingests and analyzes OTel logs, metrics, and traces in a single, high-performance database.
  • OpenTelemetry native: Built entirely on the OTel standard to prevent vendor lock-in, ensuring you never have to re-instrument your code to switch tools.
  • Sub-second query performance: Delivers blazing-fast aggregations on high-cardinality OTel attributes.
  • High data compression: Achieves industry-leading data compression of over 15x, dramatically reducing storage costs.
  • Full SQL support: Enables deep, ad-hoc analysis beyond the limitations of pre-built dashboards.

Pros:

  • Future-proofs instrumentation via OpenTelemetry: ClickStack removes the risk of vendor lock-in at the source. By standardizing on OpenTelemetry for ingestion, you control the "edge" of your observability pipeline.
  • Solves the high-cardinality aggregation problem: ClickHouse is designed to handle these aggregations significantly better than alternatives. It allows teams to process high-cardinality OTel tags efficiently on disk or leverage Materialized Views.
  • Fast, efficient performance: ClickHouse's columnar design enables high data compression (often exceeding 15x) and fast queries using data skipping via sparse primary indexes.
  • Cheap, long-term retention on object storage: Running on ClickHouse Cloud separates storage and compute, allowing you to retain full-fidelity OpenTelemetry data for months or years on low-cost object storage.
  • Excellent cost-efficiency at scale: For internal observability workloads, this stack has been found to be up to 200x more cost-effective than Datadog.
  • Predictable, transparent pricing: Operates on a simple compute-plus-storage model, eliminating complex SKUs and punitive taxes like per-host charges.
  • Flexible & Interoperable: Built on a fully open-source (Apache 2.0) foundation, allowing teams to integrate existing data pipelines using popular agents like Vector and FluentD.

Cons:

  • Emerging UI and ecosystem: As a newer solution, the ecosystem of out-of-the-box integrations is still growing compared to established vendors.
  • Advanced features in development: Features like native UI-based RBAC and sophisticated alerting are still on the roadmap.
  • Best for SREs and developers: The platform is optimized for SREs and developers comfortable with SQL and OTel concepts rather than general BI workflows.

Best for:

  • Cost-conscious organizations operating at a large scale.
  • Engineering teams who want to standardize on OpenTelemetry to own their observability data.
  • Teams struggling with high-cardinality data in Elasticsearch or Prometheus.
  • Users looking to migrate from expensive logging solutions like Splunk or Datadog.
  • Organizations that want to unify their observability and business analytics in a single platform.

2. New Relic #

As a long-standing leader in the APM space, New Relic offers a mature, all-in-one observability platform. It combines application, infrastructure, browser, and log monitoring into a single solution, providing teams with a unified view of their entire stack.

  • Key features: Comprehensive Application Performance Monitoring (APM 360), real-user monitoring (RUM), infrastructure monitoring, and an AI-powered alerting engine.
  • Pros: Established, feature-rich platform praised for its powerful APM capabilities. The alerting system is considered accurate and reliable.
  • Cons: High cost and a complex, user-based pricing model that penalizes growing teams. Some users also note challenges with dashboard configurations and Kubernetes integrations.
  • Best for: Enterprise teams that require a mature, consolidated platform with deep APM insights and are less constrained by budget.

3. Dynatrace #

Dynatrace is an enterprise-focused observability platform known for its powerful AI capabilities and automated full-stack monitoring. Its core strength lies in the "OneAgent" technology, a single agent deployment that automatically discovers and maps all components and dependencies.

  • Key features: AI-powered engine (Davis), full-stack monitoring (RUM to code-level), automated topology mapping (Smartscape), and digital experience management.
  • Pros: Strong automation and AI-driven root cause analysis simplify complex problem detection. The unified OneAgent simplifies deployment.
  • Cons: High cost and complex pricing model. The "OneAgent" can be resource-intensive, and the platform has a steep learning curve.
  • Best for: Large enterprises with complex, dynamic cloud environments that need powerful, AI-driven automation.

4. Grafana Stack (Loki, Mimir, Tempo) #

The Grafana LGTM (Loki, Grafana, Tempo, Mimir) stack is a popular open-source observability solution made up of distinct, specialized components. Grafana serves as the central visualization platform.

  • Key features: Loki (efficient log aggregation), Mimir (scalable metrics storage), Tempo (distributed tracing), and Grafana (visualization).
  • Pros: Cost-effective label-based indexing, open-source flexibility, and a unified visualization interface.
  • Cons: High operational complexity (managing multiple stateful systems), correlation happens at the application layer rather than the data layer, and data remains siloed.
  • Best for: Teams already committed to the Prometheus/Grafana ecosystem who prefer self-managed solutions and have the resources to handle complexity.

5. Splunk #

Splunk is a powerful and long-standing platform known for its capabilities in searching, analyzing, and visualizing machine-generated data. It is a staple in many enterprise IT operations for log management and SIEM.

  • Key features: Powerful Search Processing Language (SPL), real-time visibility, high scalability, and extensive integrations.
  • Pros: Market leader in log management, strong security (SIEM) features, and highly customizable.
  • Cons: Prohibitive cost at scale (especially for ingest-heavy workloads), poor query performance for complex analytics compared to columnar stores, and a steep learning curve.
  • Best for: Large enterprises with significant budgets prioritizing deep log analysis and security use cases.

6. Chronosphere #

Chronosphere is a cloud-native observability platform built for high-scale environments. It leverages the open-source M3 metrics engine to handle massive amounts of high-cardinality data.

  • Key features: Control Plane (shape/filter data before storage), scalable Prometheus backend (M3 based), and distributed tracing integration.
  • Pros: Effectively tames high cardinality via traffic shaping and offers cost predictability.
  • Cons: Relies on data dropping/aggregation to control costs (losing fidelity), metrics-heavy heritage with less mature logging, and enterprise-focused pricing.
  • Best for: Large, cloud-native organizations struggling with Prometheus scaling limits and needing strict control over metric growth.

7. Signoz #

Signoz is an open-source APM and observability platform that positions itself as a direct, self-hostable alternative to Datadog. It is powered by Open Source ClickHouse.

  • Key features: Unified UI (metrics, traces, logs), native OpenTelemetry support, and a ClickHouse backend.
  • Pros: Open-source freedom (self-hostable), cost-effective compared to SaaS, and an all-in-one "Datadog-like" experience.
  • Cons: Operational overhead for self-hosting and feature maturity compared to giants like New Relic. Critically, the self-hosted version lacks native separation of storage and compute. This results in three key inefficiencies: an inability to isolate read and write workloads, limitations on scaling to petabyte levels, and a lack of cost-efficient long-term retention on object storage.
  • Best for: Engineering teams wanting a unified, open-source solution or needing to leverage ClickHouse's efficiency without building a UI.

8. Dash0 #

Dash0 is a newer entrant built on ClickHouse Open Source similar to Signoz. It positions itself as a serverless, OpenTelemetry-native observability data lake, focusing on "Observability as Code."

  • Key features: Serverless architecture, dashboarding via Perses/Grafana, and direct OTel ingestion.
  • Pros: Zero maintenance (no backend management), no indexing costs, and strong OTel alignment.
  • Cons: Per-event pricing can punish verbosity during incidents, "headless" visualization creates a disjointed workflow, and the platform is still in early stages. Additionally, it shares the architectural limitations of standard open-source ClickHouse: without native storage and compute separation, it lacks read/write isolation and cannot leverage low-cost object storage for efficient long-term retention.
  • Best for: Teams fully committed to OpenTelemetry who want a low-ops, serverless backend and are comfortable building their own visualizations.

Comparison summary table: finding the right fit #

AlternativePricing modelArchitectureInstrumentation & OTelBest for high cardinality?Unified datastore?Cost at scale
ClickStackResource-based (compute + storage)Unified columnar databaseNative OTel (schema designed for OTel)Yes, nativelyYes (logs, metrics, traces)Very low & predictable
New RelicUser-based, ingest-basedProprietary SaaSFull OTel support; proprietary agent optionalNo, can be costlyYesHigh
DynatraceHost-based / "Davis Data Units"Proprietary SaaS / OneAgentOTel compatible; OneAgent required for AINo, can be costlyYesVery high
GrafanaResource-based (self-hosted)Siloed (Loki, Mimir, Tempo)Prometheus & OTel nativeYes, but complexNo (separate backends)Low (high ops cost)
SplunkIngest-based / "Splunk units"Search indexOTel for APM; proprietary for logsNo, struggles at scalePrimarily logs/SIEMVery high
ChronosphereUsage-based / shapedCloud-native SaaS (M3)OTel compatible; Prometheus nativeYes, via control planePrimarily metrics/tracesHigh (enterprise)
SignozUsage-based / Free OSSOSS ClickHouseNative OTelYes, nativelyYesModerate (No cost-efficient long-term retention)
Dash0Usage-basedServerless OTel Data Lake on OSS ClickHouseNative OTelYesYesModerate (No cost-efficient long-term retention)

Conclusion #

The search for a Datadog alternative often begins with sticker shock, but the real solution is found in architecture. As data volumes and cardinality continue to explode, platforms built on a search index or siloed backends will inevitably pass their scaling costs on to you. The key advantage of a unified, columnar data engine is that it treats massive-scale analytics as a native feature, not a billing problem. ClickStack, powered by ClickHouse, represents this modern approach, offering a path to deep, powerful observability without the false choice between cost and comprehensive visibility.

Ready to see what performance at scale feels like? Start a trial with ClickHouse Cloud today, or read our case study to see how high-volume teams are reducing costs by moving to a columnar engine.

Frequently asked questions

01
02
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
X imageBluesky imageSlack image
GitHub imageTelegram imageMeetup image
Rss image