Why Observability Costs Keep Exploding
Observability bills are the surprise expense that hits growing startups hardest. You start with a free Datadog tier, ship to production, and three months later you are staring at a $2,000/month invoice wondering where the money went.
The problem is volume-based pricing. Modern applications generate enormous amounts of telemetry data: logs from every request, metrics from every service, traces spanning multiple microservices. A modestly complex application with 10 services and 1,000 requests per minute generates 50 to 100 GB of logs per month, 500K to 1M metric time series, and millions of trace spans.
Axiom, Datadog, and Grafana Cloud handle this volume with fundamentally different pricing models. Axiom charges per data ingested (serverless model). Datadog charges per host plus per feature (bundled platform). Grafana Cloud charges per metric, log, and trace volume (pay-as-you-go on open-source foundations). The right choice depends on your data volume, team size, and how much you value integrated tooling versus flexibility.
For a broader comparison that includes Sentry for error tracking, our guide on OpenTelemetry vs Datadog vs Sentry covers complementary tools.
Axiom: Serverless Observability for Modern Stacks
Axiom takes a fundamentally different approach: store everything, pay only for what you ingest.
Pricing Model
Free tier: 500 GB ingest per month, 30-day retention. This is shockingly generous and covers most startups through Series A. Team plan: $25/month base plus $2 per GB ingested beyond the free tier. No per-host charges, no per-metric charges, no feature gating. You pay for data in, not data stored or queried.
Strengths
Cost predictability: you know exactly what your bill will be based on ingest volume. No surprises from metric cardinality explosions or custom metric charges. The query language (APL, derived from Kusto) is powerful for ad-hoc analysis. Serverless-first architecture means zero infrastructure to manage. Native OpenTelemetry support lets you switch from Datadog or Grafana without changing instrumentation.
Weaknesses
Smaller feature set than Datadog. No APM with automatic service maps. No synthetic monitoring. No real-user monitoring (RUM). Alerting exists but is less sophisticated than Datadog's or Grafana's. The ecosystem of integrations is growing but smaller. If you need full-stack observability from a single vendor, Axiom cannot match Datadog's breadth.
Best For
Startups running on serverless or edge compute (Vercel, Cloudflare Workers, AWS Lambda) where per-host pricing does not make sense. Teams that want to log everything without worrying about cost. Organizations using OpenTelemetry as their instrumentation standard.
Datadog: The Full-Stack Platform
Datadog is the default choice for a reason: it does everything, and it does it well.
Pricing Model
Infrastructure: $15/host/month (annually) or $23/host/month (monthly). APM: $31/host/month. Log Management: $0.10 per GB ingested plus $1.70 per million log events indexed. Custom Metrics: $0.05 per custom metric per month. Real User Monitoring: $1.50 per 1,000 sessions. Each feature is a separate line item, and costs compound fast. A startup with 20 hosts, APM, logs, and custom metrics typically pays $1,500 to $4,000/month.
Strengths
The most comprehensive observability platform available. Automatic service maps, distributed tracing with flame graphs, log analytics with pattern detection, infrastructure monitoring with 750+ integrations, synthetic monitoring, real-user monitoring, security monitoring, CI/CD visibility, and database monitoring. Everything is correlated: click on an error in a log, see the trace, see the affected host metrics, see the deployment that caused it. This correlation saves hours of debugging time.
Weaknesses
Cost is the primary concern. Datadog bills add up quickly as you add features and scale hosts. The per-host model penalizes architectures with many small services (Kubernetes pods, serverless functions). Custom metric cardinality can cause unexpected bill spikes. You can easily hit $5,000 to $10,000/month at the growth stage. Vendor lock-in is significant because Datadog's instrumentation libraries are proprietary (though they now support OpenTelemetry).
Best For
Teams that want a single platform for all observability needs. Organizations with traditional server or container-based infrastructure. Companies where debugging speed justifies premium pricing. Enterprises that need security monitoring alongside observability.
Grafana Cloud: Open-Source Flexibility with Managed Convenience
Grafana Cloud builds on the Prometheus, Loki, and Tempo open-source projects, giving you flexibility with less operational burden.
Pricing Model
Free tier: 10,000 metrics series, 50 GB logs, 50 GB traces per month. Pro: starts at $29/month with usage-based pricing beyond included volumes. Metrics: $8 per 1,000 active series per month. Logs: $0.50 per GB. Traces: $0.50 per GB. The pricing is transparent and scales predictably, though it can get expensive at high cardinality.
Strengths
No vendor lock-in. Grafana uses open standards (Prometheus for metrics, OpenTelemetry for traces, Loki for logs). You can migrate between self-hosted and cloud, or switch to another provider, without changing instrumentation. Grafana dashboards are the most flexible and customizable in the industry. The community has thousands of pre-built dashboards for every technology. Alerting with Grafana Alerting is powerful and supports multiple notification channels.
Weaknesses
Less integrated than Datadog. Correlating logs, metrics, and traces requires more manual dashboard configuration. Automatic service maps and APM features are less polished. The learning curve is steeper, especially for PromQL (the Prometheus query language). Self-hosted Grafana stack (Prometheus + Loki + Tempo) requires significant ops expertise, though Grafana Cloud eliminates most of that burden.
Best For
Teams that value open standards and vendor portability. Infrastructure teams comfortable with Prometheus and PromQL. Organizations that need custom dashboards beyond what Datadog or Axiom offer. Budget-conscious teams that want to start with self-hosted and migrate to cloud as needed.
Cost Comparison at Different Scales
Here is what each platform costs at three common startup scales:
Seed Stage (5 hosts, 20 GB logs/month, 5,000 metric series)
- Axiom: Free (well within free tier)
- Datadog: $75/month infrastructure + $155/month APM + $2/month logs = ~$232/month
- Grafana Cloud: Free (within free tier limits)
Series A (20 hosts, 100 GB logs/month, 20,000 metric series)
- Axiom: Free to $25/month (near free tier limit)
- Datadog: $300 infrastructure + $620 APM + $10 logs + custom metrics = ~$1,200 to $2,000/month
- Grafana Cloud: $29 base + $80 metrics + $25 logs = ~$150 to $300/month
Series B (100 hosts, 500 GB logs/month, 100,000 metric series)
- Axiom: $25 base + ~$500 overage = ~$525/month
- Datadog: $1,500 infrastructure + $3,100 APM + $50 logs + $5,000 custom metrics = ~$8,000 to $12,000/month
- Grafana Cloud: $29 base + $800 metrics + $225 logs = ~$1,100 to $2,000/month
The pattern is clear: Axiom is cheapest at all scales, Grafana Cloud is in the middle, and Datadog is the most expensive but most feature-complete. The question is whether Datadog's additional features justify the 5x to 10x cost premium for your team.
For strategies on keeping your overall cloud bill under control, our guide on reducing cloud costs covers observability alongside compute and storage optimization.
OpenTelemetry: The Common Thread
Regardless of which platform you choose, instrument your application with OpenTelemetry (OTel). Here is why:
Vendor Portability
OTel is the CNCF standard for telemetry collection. All three platforms support OTel natively. If you instrument with OTel, you can switch from Datadog to Axiom to Grafana Cloud by changing a configuration file, not your application code. This is the single most important decision for avoiding vendor lock-in.
How It Works
Add the OTel SDK to your application (available for Python, Node.js, Go, Java, .NET, and more). The SDK automatically instruments HTTP requests, database queries, and framework operations. It exports traces, metrics, and logs to an OTel Collector, which forwards them to your observability platform. The Collector can also sample, filter, and transform data before export, giving you control over cost.
Sampling Strategy
You do not need to export every trace and every log. Head-based sampling (decide at trace start) keeps a random percentage of traces. Tail-based sampling (decide at trace end) keeps all traces with errors or high latency. A good sampling strategy reduces data volume by 80 to 90 percent while keeping all the interesting data. This is the most effective way to control observability costs regardless of platform.
Implementation Recommendation
Start with OTel auto-instrumentation for your language/framework. Add custom spans for business-critical operations (payment processing, user signup, search queries). Export to whichever platform you choose today, knowing you can switch tomorrow.
Our Recommendation and Getting Started
Here is our opinionated recommendation based on company stage:
Pre-seed to Seed: Start with Axiom. The free tier is generous enough to cover you for months. The serverless pricing model aligns with early-stage budgets. You can always migrate later.
Series A: Evaluate whether you need Datadog's full feature set. If your engineering team is small (under 10) and you primarily debug via logs and basic metrics, stick with Axiom or Grafana Cloud. If you have complex distributed systems and need correlated observability across 15+ services, Datadog's integration value justifies the cost.
Series B and beyond: Most companies at this stage are on Datadog and feeling the cost. Consider a hybrid approach: Datadog for APM and tracing (where correlation is most valuable), Grafana Cloud for metrics and dashboards (where it is cheaper at scale), and Axiom for log storage (cheapest per GB). Use OTel to send data to multiple destinations.
Regardless of platform, follow these practices from day one: instrument with OpenTelemetry, implement sampling from the start, set up cost alerts at 80 percent of your budget, and review your observability bill monthly. Our guide on setting up app monitoring covers the implementation details.
Need help choosing and implementing the right observability stack? Book a free strategy call and we will assess your infrastructure, data volume, and budget to recommend the best approach.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.