Technology·14 min read

Helicone vs Portkey vs LiteLLM: LLM Proxy Gateways Compared

Choosing the wrong LLM proxy gateway costs you months of migration pain and thousands in avoidable spend. Here is how Helicone, Portkey, and LiteLLM actually compare on the dimensions that matter.

Nate Laquis

Nate Laquis

Founder & CEO

Why LLM Proxy Gateways Are Now Table Stakes

Two years ago, most teams called OpenAI directly from their application code and moved on with their lives. That was fine when you had one model, one provider, and a predictable bill. Today, production LLM stacks look very different. You are probably calling OpenAI for some tasks, Anthropic for others, maybe Gemini or Mistral for cost-sensitive workloads, and you have prompt versioning, caching, fallback logic, and spend tracking bolted on with custom middleware that nobody wants to maintain.

An LLM proxy gateway centralizes all of that. It sits between your application and every model provider, giving you one integration point for routing, caching, rate limiting, observability, and cost controls. The three dominant players in this space are Helicone, Portkey, and LiteLLM. Each takes a fundamentally different approach to solving the same problem, and the differences matter more than most comparison articles admit.

Helicone leads with observability and analytics. Portkey leads with enterprise reliability and a managed gateway. LiteLLM leads with open-source flexibility and a unified API interface. The right choice depends on your team size, your deployment model (cloud vs. self-hosted), how many providers you use, and whether you need a proxy you control completely or a managed service you can set up in an afternoon.

Analytics dashboard displaying LLM proxy gateway metrics and cost tracking data

This comparison is based on hands-on evaluation across real production workloads. We are not summarizing marketing pages. We will cover architecture, pricing, feature depth, self-hosting viability, and the specific scenarios where each tool wins or falls short. If you are building an AI gateway architecture for your organization, this should save you weeks of evaluation time.

Architecture and Integration Models

The most important architectural difference between these three tools is how they intercept your LLM traffic. This decision affects latency, data privacy, vendor lock-in, and how much control you retain over your infrastructure.

Helicone

Helicone operates primarily as a logging proxy. You change your base URL from api.openai.com to oai.helicone.ai, add your Helicone API key as a header, and every request passes through their servers before reaching the LLM provider. The proxy adds roughly 30 to 80ms of overhead per request. Helicone also supports an async logging mode where you send requests directly to the provider and log them to Helicone after the fact, eliminating the latency penalty entirely. This is a smart design choice: teams that care about latency can still get full observability without putting Helicone in the critical path.

Integration is minimal. You change one URL and add one header. SDKs exist for Python, Node.js, and other languages, but they are thin wrappers. There is no proprietary SDK you must adopt, no custom request format. If you decide to leave Helicone, you revert the base URL and remove the header. Migration cost is near zero.

Portkey

Portkey operates as a full gateway proxy. All traffic routes through Portkey's infrastructure (or your self-hosted Portkey instance), and the gateway handles routing, retries, fallbacks, caching, and load balancing before forwarding requests to the provider. Portkey's gateway is more opinionated than Helicone's. You define routing configs (primary and fallback models, conditional routing rules, load balancing weights) in a JSON config object that the gateway interprets at request time.

The integration model is deeper. You use Portkey's SDK or REST API with their config format. The SDK wraps OpenAI's client, so the migration path is reasonable, but you are adopting Portkey-specific constructs (virtual keys, configs, metadata tags). Removing Portkey later means rewriting your routing and fallback logic, not just changing a URL.

LiteLLM

LiteLLM takes a fundamentally different approach. It is an open-source Python library that provides a unified interface (litellm.completion()) across 100+ LLM providers. You call one function with a model string like "claude-3-opus" or "gpt-4-turbo" and LiteLLM handles the provider-specific API translation. The proxy server component (LiteLLM Proxy) wraps this library in an OpenAI-compatible API server that you self-host.

This means LiteLLM runs in your infrastructure by default. No data leaves your network unless you are calling the LLM providers themselves. The proxy exposes an OpenAI-compatible endpoint, so any tool or library that works with OpenAI's API works with LiteLLM Proxy without changes. The trade-off is operational overhead: you deploy it, you scale it, you monitor it, you patch it.

Observability and Cost Tracking

Observability is where Helicone has built the deepest moat. If your primary pain point is "we have no idea how much we are spending on LLMs or which features consume the most tokens," Helicone is the strongest choice by a wide margin.

Helicone's Observability Stack

Helicone's dashboard gives you real-time visibility into every LLM request: cost per request, latency distribution, token counts (input and output), model usage breakdown, and error rates. You can slice data by custom properties (user ID, feature name, environment, team) that you attach as headers. The cost tracking is accurate to the penny because Helicone maintains a pricing table for every model and updates it when providers change their rates.

The session and trace views are particularly useful. You can group related requests into sessions (a multi-turn chat conversation, a RAG pipeline with retrieval and generation steps) and see the total cost and latency for the entire session. Prompt versioning lets you track which prompt template produced which results, making A/B testing straightforward. The alerting system notifies you when spend exceeds thresholds or error rates spike.

Portkey's Observability

Portkey's analytics are solid but oriented more toward operational monitoring than deep cost analytics. You get request logs, latency tracking, token usage, and cost estimates. The trace view groups requests by trace ID, which is useful for multi-step agent workflows. Portkey also tracks cache hit rates, fallback triggers, and routing decisions, giving you visibility into the gateway layer itself, not just the LLM calls.

Where Portkey falls short compared to Helicone is in the depth of cost analytics. Helicone lets you build custom dashboards, export data for BI tools, and run cost allocation reports by team or feature. Portkey's analytics are more focused on operational health than financial governance.

LiteLLM's Observability

LiteLLM's built-in observability is basic. The proxy server logs requests to a database and exposes a simple UI for viewing them. You get token counts, costs, and latency, but the dashboards are minimal. However, LiteLLM compensates through integrations. It can forward logs to Langfuse, Helicone, Lunary, or any OpenTelemetry-compatible backend. Many teams run LiteLLM as the proxy layer and Langfuse or Helicone as the observability layer, getting the best of both worlds.

For teams that already have a mature observability stack (Datadog, Grafana, custom dashboards), LiteLLM's approach of emitting structured logs and metrics that you route to your existing tools is actually an advantage. You do not need to adopt yet another dashboard. For teams without that infrastructure, the out-of-box experience is noticeably weaker than Helicone's. Understanding your LLM API pricing across providers is critical regardless of which proxy you choose.

Routing, Fallbacks, and Reliability

When your primary model provider goes down at 2am, your gateway's fallback logic determines whether your customers notice. This is where Portkey excels and where the architectural differences between these tools become most visible.

Server infrastructure representing reliable LLM proxy gateway with fallback routing

Portkey's Routing Engine

Portkey's routing is the most sophisticated of the three. You define routing configs that specify primary models, fallback chains, load balancing weights, conditional routing rules, and retry policies. A single config can say: "Send 80% of traffic to GPT-4 Turbo and 20% to Claude Sonnet. If GPT-4 Turbo fails, fall back to Claude Opus. Retry up to 3 times with exponential backoff. If all retries fail, return a cached response." This is all declarative JSON, no code required.

Portkey also supports conditional routing based on request metadata. You can route enterprise customers to more capable (and expensive) models while routing free-tier users to cost-efficient alternatives. The gateway evaluates these conditions at request time with sub-millisecond overhead. For organizations that need to implement AI model routing for cost optimization, Portkey's config-driven approach reduces the engineering effort significantly.

LiteLLM's Routing

LiteLLM supports fallbacks and load balancing through its router module. You configure a list of model deployments with priorities and weights, and the router handles failover and load distribution. The routing logic is less flexible than Portkey's (no conditional routing based on request properties out of the box), but it covers the core use cases: failover between providers, load balancing across multiple API keys for the same model, and rate limit-aware routing that shifts traffic away from providers approaching their rate limits.

One LiteLLM advantage: because it is open source and written in Python, you can extend the routing logic with custom code. If you need routing behavior that Portkey's declarative config cannot express (routing based on prompt content analysis, dynamic budget allocation, time-of-day routing), you write a Python function and plug it in. This flexibility comes at the cost of maintaining custom code, but for teams with specific routing requirements, it is a genuine advantage.

Helicone's Routing

Helicone's routing capabilities are minimal. It is primarily an observability and logging layer, not a routing engine. You can configure basic retries and rate limiting, but for sophisticated fallback chains, load balancing, and conditional routing, you need to build that logic in your application code or pair Helicone with another tool. This is not necessarily a weakness. If your primary need is observability and you already handle routing at the application layer, adding a routing engine you do not need just increases complexity.

Caching, Rate Limiting, and Cost Controls

Caching identical or semantically similar LLM requests is one of the highest-leverage cost optimizations available. The implementation quality varies significantly across these three tools.

Caching Comparison

Portkey offers both exact match caching and semantic caching. Exact match caching returns cached responses when the prompt is byte-for-byte identical. Semantic caching uses embedding similarity to match prompts that are worded differently but mean the same thing. Cache TTLs are configurable per request, and you can force cache bypass with a header when you need a fresh response. In production, teams report cache hit rates of 20 to 50% on customer-facing chatbot workloads, translating directly to cost savings.

Helicone supports caching through its proxy layer. You enable caching with a header (Helicone-Cache-Enabled: true) and set TTLs per request. Helicone's caching is primarily exact match, though they have been expanding semantic caching capabilities. The simplicity of the header-based activation is appealing: you can enable caching for specific request types without changing your code architecture.

LiteLLM supports caching with multiple backends: Redis, in-memory, and S3. You configure caching in the proxy config file, specifying which models and request types should be cached. LiteLLM's caching is exact match by default, but you can integrate semantic caching through custom cache key functions. Because you control the caching infrastructure (your Redis instance, your S3 bucket), you have full control over data residency and retention.

Rate Limiting

LiteLLM's rate limiting is the most granular of the three. You define rate limits per user, per API key, per model, or per team. Limits can be set on requests per minute, tokens per minute, or budget per day/month. The proxy tracks usage against these limits in real time and returns clear 429 responses when limits are exceeded. For platform companies that resell LLM access to customers, LiteLLM's per-key budgeting is essential.

Portkey provides rate limiting through its gateway with similar granularity: per-key, per-user, and per-model limits. Portkey also supports spend-based limits, cutting off a user or team when they hit a dollar-amount ceiling. Helicone's rate limiting is more basic, focused on request counts rather than token or spend-based limits.

Budget Controls

All three tools let you set budget alerts, but LiteLLM goes furthest with hard budget enforcement. You can set a monthly budget per API key, and the proxy will reject requests once the budget is exhausted. This is critical for teams that need to guarantee they will not exceed their LLM spend, not just get notified when they do. Portkey offers similar budget enforcement through its enterprise tier. Helicone focuses on budget visibility and alerting rather than hard enforcement.

Self-Hosting, Pricing, and Vendor Lock-In

For many engineering teams, the question is not which tool has the best dashboard. It is which tool you can run in your own infrastructure without sending sensitive data through a third party. This is where the three options diverge sharply.

Developer coding an LLM proxy gateway self-hosted deployment configuration

LiteLLM: Open Source First

LiteLLM is Apache 2.0 licensed. You clone the repo, configure your models in a YAML file, run the Docker container, and you have a fully functional LLM proxy running in your VPC. No data leaves your network. No usage-based fees to the LiteLLM team (unless you want their enterprise features like SSO, advanced analytics, and premium support). For regulated industries (healthcare, finance, government), this is often the deciding factor. The total cost of self-hosting LiteLLM is the compute for the proxy server (a single instance handles thousands of requests per second) plus your engineering time for setup and maintenance.

The trade-off is real: you are responsible for uptime, scaling, upgrades, and debugging. LiteLLM ships updates frequently, sometimes with breaking changes. You need someone on your team who understands the codebase well enough to troubleshoot issues. For teams with strong DevOps capabilities, this is a non-issue. For smaller teams without dedicated infrastructure engineers, it is a meaningful burden.

Helicone: Cloud-First with Self-Host Option

Helicone is open source (MIT licensed) and can be self-hosted, but the self-hosting experience is more complex than LiteLLM's. The full Helicone stack includes a proxy worker (Cloudflare Workers), a web dashboard (Next.js), and a database (ClickHouse + PostgreSQL). Getting all of that running in your infrastructure requires more effort than running a single Docker container. Most Helicone users opt for the cloud-hosted version.

Helicone's cloud pricing is straightforward: a free tier (up to 100K requests per month), a Pro tier at $70/month (up to 10M requests), and a custom enterprise tier. There are no per-request fees beyond the tier limits, which makes costs predictable. Compared to Portkey, Helicone's pricing is generally more affordable for observability-focused use cases.

Portkey: Managed Gateway

Portkey offers a cloud-hosted gateway and an enterprise self-hosted deployment. The cloud version is the primary offering, and most of Portkey's feature development targets it. Cloud pricing is based on log events: a free tier (10K logs/month), Growth at $49/month (up to 1M logs), and enterprise tiers with custom pricing. Portkey's self-hosted option is available for enterprise customers and requires a commercial license.

The lock-in profile differs across all three. LiteLLM has the lowest lock-in because it exposes an OpenAI-compatible API. Anything that talks to OpenAI talks to LiteLLM. Switching away means pointing your code at a different OpenAI-compatible endpoint. Helicone has low lock-in because it is primarily a logging layer. Portkey has moderate lock-in because its routing configs, virtual keys, and SDK-specific features become part of your architecture.

Which Tool Fits Your Team

After evaluating all three tools across production workloads, here is a direct recommendation based on common team profiles.

Choose Helicone If:

Your primary pain is visibility. You need to know where your LLM budget is going, which prompts perform well, and how costs break down by feature and user. You want minimal integration effort (one URL change), you do not need complex routing or fallback logic, and you value a polished analytics dashboard. Helicone is ideal for teams in the 100K to 10M requests per month range that need observability first and can handle routing in their application layer.

Choose Portkey If:

You need a production-grade gateway with sophisticated routing, fallbacks, and reliability features. You are calling multiple LLM providers and need declarative config-driven routing without writing custom infrastructure code. You want caching, load balancing, and fallbacks handled at the gateway layer. Portkey fits teams that need enterprise reliability and are comfortable with a managed service. It is the strongest choice for organizations running business-critical LLM workloads where downtime has real revenue impact.

Choose LiteLLM If:

You need to self-host everything. You are in a regulated industry, your security team will not approve sending prompts through a third-party proxy, or you need the flexibility to customize routing logic with code. You want an OpenAI-compatible proxy that works with your existing tools without SDK changes. LiteLLM is the right call for platform teams building internal LLM infrastructure and for companies that need per-user budgeting and rate limiting at the proxy layer.

The Hybrid Approach

Many production deployments combine two of these tools. LiteLLM as the proxy layer (self-hosted, OpenAI-compatible, handles routing and rate limiting) paired with Helicone or Langfuse for observability is a common and effective pattern. You get full infrastructure control from LiteLLM and rich analytics from a purpose-built observability tool. This avoids the compromise of picking one tool that is mediocre at something you need it to be great at.

Whichever direction you choose, the important thing is to get a proxy in place before your LLM spend reaches the point where the lack of visibility and control becomes painful. If you are already past that point and need help evaluating, deploying, or customizing an LLM gateway for your specific architecture, book a free strategy call and we will walk through the options together.

Need help building this?

Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.

LLM proxy gatewayHelicone vs Portkey vs LiteLLMAI gateway comparisonLLM observability toolsLLM cost optimization

Ready to build your product?

Book a free 15-minute strategy call. No pitch, just clarity on your next steps.

Get Started