What MCP Servers Actually Need from a Hosting Platform
MCP servers are not typical web services. They sit between an AI agent and whatever system the agent needs to interact with, which means they have a peculiar mix of requirements that most hosting guides gloss over. Before you pick a platform, you need to understand exactly what makes MCP hosting different from hosting a REST API or a static site.
First, latency matters more than throughput. An AI agent calling tools through MCP is waiting synchronously for each response before it can continue reasoning. If your MCP server adds 300ms of overhead per tool call, and the agent chains five calls to complete a task, you have added 1.5 seconds of dead time before the user sees any output. Agents that feel sluggish almost always trace back to slow tool execution, not slow model inference. You want your hosting platform to deliver sub-50ms overhead on the transport layer, which means edge deployment or regional placement close to your model provider.
Second, MCP connections are stateful. The Streamable HTTP transport uses session IDs to maintain context between requests. Your hosting platform needs a way to store and retrieve session state quickly. Stateless serverless functions can work if you pair them with an external state store, but platforms that offer built-in persistent state (like Cloudflare Durable Objects or Fly.io persistent volumes) simplify the architecture considerably.
Third, tool execution sandboxing is a real concern. Your MCP server runs arbitrary tool handlers that might execute database queries, call external APIs, or process user-uploaded files. If a tool handler crashes, hangs, or consumes excessive memory, it should not take down other sessions. Platforms that run each request in an isolated context (V8 isolates, containers, microVMs) give you natural fault isolation. Platforms that run everything in a shared process require you to build your own safeguards.
Finally, you need a deployment story that supports rapid iteration. MCP servers evolve quickly as you add tools, refine descriptions for better model accuracy, and respond to how agents actually use your server in production. Your hosting platform should support sub-minute deployments, instant rollbacks, and some form of traffic splitting so you can test new tool definitions without breaking existing agent integrations. If you are still deciding what your MCP server should look like, our guide to building custom MCP servers covers the design decisions in detail.
Cloudflare Workers: Edge-First MCP with Durable Objects
Cloudflare Workers is the platform most teams should evaluate first for MCP server hosting. The combination of edge deployment across 300+ locations, V8 isolate-based execution, and Durable Objects for stateful sessions covers the core MCP requirements with minimal architectural complexity.
Why the Edge Matters for MCP
When your MCP server runs on Cloudflare Workers, every tool call routes to the nearest edge location. For users in North America calling an agent backed by Anthropic's API (which runs in US regions), that means sub-20ms network overhead between the model provider and your MCP server. Compare that to a centrally hosted server in us-east-1 that adds 40 to 120ms for users on the West Coast or in Europe. That difference compounds across multi-step agent workflows.
Cloudflare has also published official MCP server templates and SDK integrations. The workers-mcp package handles Streamable HTTP transport setup, session management, and CORS configuration out of the box. You define your tools as methods on a class, deploy with "wrangler deploy," and your MCP server is live globally in under 30 seconds. The developer experience is genuinely excellent.
Durable Objects for Session State
The stateful nature of MCP sessions is the biggest headache with serverless hosting, and Durable Objects solve it cleanly. Each MCP session gets its own Durable Object instance, a single-threaded, persistent execution context that lives at the edge. The Durable Object maintains session state in memory between requests and can persist it to disk for durability. This means your MCP server can track conversation context, cache intermediate results across tool calls, and maintain per-session rate limit counters without touching an external database.
Durable Objects also provide a built-in WebSocket API, which is useful if you want to support real-time streaming from your MCP server. For long-running tool executions (like generating a report or running a data pipeline), you can stream progress updates to the client through the Durable Object's WebSocket rather than making the agent poll for status.
V8 Isolate Limitations You Need to Know
Workers run on V8 isolates, not Node.js. This means you cannot use Node.js native modules, the fs module, child_process, or any package that depends on C++ bindings. The compatibility layer (nodejs_compat) covers most of the standard library (crypto, streams, buffer, util), but if your MCP server needs to spawn subprocesses, read from the local filesystem, or use native database drivers like pg-native, Workers will not work without refactoring.
CPU time limits are another constraint. Workers get 30 seconds of CPU time on the paid plan (50ms on the free plan). For most MCP tool handlers that call external APIs and return results, this is more than enough. But if your tools do heavy computation, like parsing large documents, running data transformations, or processing images, you will hit the CPU limit. The workaround is to offload heavy work to a backend service (running on Modal or Fly.io, for instance) and have your Workers-based MCP server call it via HTTP.
Memory is capped at 128MB per isolate. This is fine for most tool handlers, but it means you cannot load large datasets or ML models into memory. If your MCP server needs to do anything memory-intensive, Workers is not the right platform for that specific tool. You can still host the MCP server on Workers and route compute-heavy tool calls to a separate backend.
Modal: Container-Based MCP with GPU Access
Modal takes a fundamentally different approach to serverless compute. Instead of V8 isolates, it runs your code in purpose-built containers with full Linux environments. For MCP servers that need Python dependencies, GPU access, or heavy computation, Modal fills the gap that Cloudflare Workers leaves open.
Python-Native MCP Hosting
Modal is built for Python workloads, and the MCP Python SDK (using the FastMCP pattern) runs on it without any modifications. You define your MCP server as a Modal app, decorate your tool handlers with Modal's @web_endpoint or @asgi_app decorators, and deploy with "modal deploy." Your MCP server gets a public HTTPS endpoint, automatic scaling from zero to hundreds of containers, and a full Python environment with any pip dependencies you need.
The container-based model means you can install any package, including numpy, pandas, scipy, Pillow, ffmpeg, or specialized ML libraries. If your MCP server processes data (aggregating analytics, generating charts, transforming CSVs), Modal gives you a full Python runtime with no restrictions. This is a significant advantage over Workers for data-heavy MCP tools.
GPU Access for ML-Powered Tools
Modal is one of the few serverless platforms that offers GPU access, and this unlocks MCP tool patterns that are impossible elsewhere. Consider an MCP server for a design tool that offers an "enhance_image" tool backed by a super-resolution model, or a coding assistant with a "run_code_analysis" tool that uses a locally hosted code model for security scanning. On Modal, you can attach an NVIDIA T4 ($0.000164/sec), A10G ($0.000306/sec), or A100 ($0.001036/sec) to any function and scale to zero when idle.
The cold start penalty for GPU functions is real, though. Spinning up a container with a GPU attachment takes 5 to 15 seconds depending on the GPU type and your container image size. For MCP tools that need sub-second response times, this is a dealbreaker unless you keep at least one warm instance running (Modal calls this "keep_warm," and it costs you the per-second GPU rate continuously). For tools where users expect a brief wait (image generation, document analysis, model inference), the cold start is acceptable.
Container Snapshots and Fast Cold Starts
Modal mitigates cold starts through container snapshots. When you deploy, Modal builds your container image, installs dependencies, and creates a snapshot. Subsequent cold starts restore from this snapshot rather than rebuilding from scratch. For a typical Python MCP server with moderate dependencies, cold starts run between 1 and 3 seconds. This is slower than Cloudflare Workers (under 5ms cold start) but faster than traditional container platforms like ECS or Cloud Run (10 to 30 seconds).
You can optimize cold starts further by keeping your container image small, pre-loading heavy imports in your Modal image definition (so they are included in the snapshot), and using Modal's "keep_warm" setting to maintain a minimum number of hot instances. For production MCP servers handling regular traffic, one or two warm instances eliminate cold starts for most requests and cost only a few dollars per day.
When Modal Makes Sense for MCP
Choose Modal when your MCP server is Python-based and depends on packages that will not run on V8 isolates. Choose it when your tools need GPU access for inference, image processing, or heavy computation. Choose it when your tool handlers are genuinely CPU or memory intensive, not just API proxy calls. If your MCP server mostly calls external APIs and returns formatted results, Modal is overkill and more expensive than Workers or Fly.io for that workload.
Fly.io: Full VM Control with Global Deployment
Fly.io occupies the middle ground between serverless functions and traditional cloud VMs. It runs your application in Firecracker microVMs across 35+ regions globally, giving you the deployment simplicity of a PaaS with the flexibility of a full Linux environment. For MCP servers that need persistent processes, long-running connections, or fine-grained system control, Fly.io is often the best fit.
Persistent Processes and Long-Lived Connections
Unlike Workers (which are request-scoped) and Modal (which scales to zero by default), Fly.io keeps your application running as a persistent process. This is a significant advantage for MCP servers that maintain in-memory state, connection pools to databases, or cached data across requests. Your MCP server starts up once, loads its configuration and connections, and handles all subsequent tool calls without cold start overhead.
Persistent processes also make it trivial to support WebSocket connections for real-time MCP streaming. On Fly.io, you deploy a standard Node.js or Python process that accepts both HTTP and WebSocket connections. There is no timeout on connection duration (unlike Lambda's 15-minute limit or Workers' WebSocket restrictions). For MCP servers that stream progress updates during long-running tool executions, this flexibility is essential.
Global Deployment and Anycast Routing
Fly.io uses anycast routing to direct traffic to the nearest healthy instance of your application. When you deploy to multiple regions (say, iad for US East, lax for US West, ams for Europe, and nrt for Asia), each tool call routes to the closest region automatically. You configure multi-region deployment in your fly.toml file with a list of regions, and Fly.io handles the rest.
The practical latency difference is meaningful. A single-region deployment in iad adds 80 to 120ms of network latency for users in Europe. A multi-region deployment on Fly.io brings that down to 15 to 30ms because the traffic hits the Amsterdam instance instead. For MCP servers backing customer-facing AI agents, shaving 100ms off every tool call translates directly to better user experience.
Persistent Volumes and SQLite
Fly.io supports persistent volumes attached to your microVMs. This opens up a deployment pattern that is surprisingly effective for MCP servers: using SQLite as your session store and configuration database. Instead of provisioning a managed Postgres or Redis instance for session state, you store everything in a SQLite database on a persistent volume. Reads are sub-millisecond (no network hop), writes are fast enough for MCP session traffic, and backups go to S3 or Tigris via LiteFS replication.
This pattern keeps your infrastructure simple and cheap. A single Fly.io machine with a 1GB persistent volume can handle hundreds of concurrent MCP sessions with SQLite as the state store. You avoid the cost of a managed database ($15 to $50/month for small instances on most providers) and the latency of network database calls. For MCP servers that do not need horizontal write scaling, SQLite on Fly.io is a genuinely underrated approach. For more on how remote vs. local patterns affect your MCP architecture, see our MCP remote servers production guide.
When Fly.io Wins
Choose Fly.io when you need persistent processes (connection pools, in-memory caches, background jobs). Choose it when you want full control over the runtime environment without managing Kubernetes or bare metal. Choose it when your MCP server runs a standard Node.js or Python application that you want to deploy globally with minimal configuration changes. Fly.io is also the best option if you need to run sidecar processes alongside your MCP server, like a local vector database (Qdrant, Chroma) for retrieval-augmented tool execution.
Pricing Comparison at Real-World Scale
Pricing for MCP server hosting is tricky to compare because each platform charges differently. Cloudflare bills per request and CPU time. Modal bills per container-second with GPU add-ons. Fly.io bills for always-on machines and bandwidth. The right comparison depends on your actual usage pattern, so let us model three realistic scenarios.
Scenario 1: Low Volume, 10,000 Tool Calls per Month
This is a startup with a handful of internal AI agents or a product in early beta. On Cloudflare Workers, 10,000 requests cost effectively nothing. The free tier covers 100,000 requests per day, and even on the $5/month Workers Paid plan, the compute cost for 10,000 short-lived tool handlers is under $0.10. Add Durable Objects for session state at $0.15 per million requests and you are looking at roughly $5.15/month total.
On Modal, the free tier gives you $30/month of compute credits. A standard container running for the duration of 10,000 tool calls (assuming 200ms average execution time) consumes about 33 minutes of compute. At $0.000463/sec for a standard container, that is roughly $0.92/month. Well within the free tier. If you need a GPU, costs go up significantly, but for API-proxy-style MCP tools, Modal stays cheap at low volume.
On Fly.io, the smallest machine (shared-cpu-1x, 256MB RAM) costs about $1.94/month if it runs 24/7. You can scale to zero with Fly Machines (no cost when stopped), but then you pay a cold start penalty of 2 to 5 seconds per wake-up. For 10,000 tool calls with bursty traffic, keeping one small machine warm at $1.94/month is the pragmatic choice.
Scenario 2: Medium Volume, 500,000 Tool Calls per Month
This is a production product with active users and multiple agent workflows. On Cloudflare Workers ($5/month base), 500,000 requests with an average of 10ms CPU time each costs about $0.75 in CPU charges plus the base plan fee. Durable Object requests add roughly $0.075. Total: around $5.83/month. Cloudflare is absurdly cheap at this scale for compute-light workloads.
On Modal, 500,000 tool calls at 200ms average execution equals roughly 27.8 hours of container time. At $0.000463/sec, that is about $46/month. If your tools are faster (50ms average), costs drop to about $11.50/month. Modal pricing scales linearly with execution time, so efficiency in your tool handlers directly reduces your bill.
On Fly.io, a dedicated-cpu-1x machine (2GB RAM) runs about $31/month and handles 500,000 tool calls without breaking a sweat. Add a second machine in another region for global coverage and you are at $62/month. This is more expensive than Workers for simple workloads but gives you a full runtime environment, persistent processes, and multi-region coverage.
Scenario 3: High Volume, 5 Million Tool Calls per Month
At this scale, Cloudflare Workers remains the cost leader for compute-light MCP servers, coming in around $12 to $18/month depending on CPU time per request. Modal costs scale to $460/month or more depending on execution duration. Fly.io with a cluster of dedicated machines across three regions runs roughly $180 to $250/month, but you get persistent connections, full runtime control, and predictable performance.
The takeaway: Cloudflare Workers wins on pure cost for MCP servers that primarily proxy API calls and do light data formatting. Modal is cost-competitive only when you need its unique capabilities (GPUs, heavy Python dependencies). Fly.io costs more than Workers but less than Modal for standard workloads, and gives you operational flexibility that serverless platforms cannot match.
Cold Starts, Latency Benchmarks, and Security Sandboxing
Numbers matter more than marketing claims when you are picking a hosting platform for production MCP servers. Here is what we have measured across real deployments and what you should expect.
Cold Start Benchmarks
Cloudflare Workers: under 5ms cold start. V8 isolates spin up almost instantly because there is no container to boot, no runtime to initialize, and no filesystem to mount. Your MCP server is warm on the first request. This is the gold standard for cold start performance and the reason Workers dominates for latency-sensitive MCP hosting.
Modal: 1 to 3 seconds for standard CPU containers, 5 to 15 seconds for GPU-attached containers. The snapshot-based restore is faster than building from scratch, but it is still orders of magnitude slower than Workers. For MCP servers with steady traffic, the "keep_warm" setting eliminates cold starts, but you pay for idle compute. For bursty or low-traffic MCP servers, cold starts are a real concern.
Fly.io: 2 to 5 seconds when scaling from zero (Fly Machines). If your machine is already running (which is the typical production configuration), there is no cold start at all. Fly.io's persistent process model means your MCP server stays warm as long as the machine is running. The tradeoff is that you pay for idle time, but for production workloads, this is usually the right call.
Request Latency Overhead
Beyond cold starts, each platform adds baseline overhead to every tool call. Cloudflare Workers adds 1 to 5ms of platform overhead on top of your handler execution time. Fly.io adds 2 to 8ms for local requests (same region) and 15 to 30ms for anycast-routed requests. Modal adds 10 to 25ms of platform overhead per request, which is higher because of the container routing layer.
For a typical MCP tool call that queries a database and returns formatted results (50ms handler time), total response times look like this: Workers at 55ms, Fly.io at 60 to 80ms, Modal at 75 to 100ms. The difference is small for a single call, but agents that chain 5 to 10 tool calls per task will feel the cumulative gap.
Security Sandboxing for Tool Execution
MCP tool handlers run code that interacts with external systems, processes user input, and potentially executes dynamic operations. Sandboxing prevents a buggy or malicious tool handler from affecting other sessions or compromising the host environment.
Cloudflare Workers provide the strongest default isolation. Each request runs in its own V8 isolate with no shared memory, no filesystem access, and no ability to affect other isolates. A crashing tool handler simply returns an error for that request. Other sessions are completely unaffected. This isolation model was designed for multi-tenant security (Cloudflare runs millions of customers' code on shared infrastructure), so it is battle-tested.
Modal runs each function invocation in its own container with its own filesystem, memory space, and process tree. Containers are ephemeral and destroyed after execution. This provides strong isolation, though the overhead is higher than V8 isolates. Modal also supports custom container images, so you can lock down the environment by removing unnecessary packages and restricting network access.
Fly.io provides VM-level isolation through Firecracker microVMs. Each application gets its own microVM with a dedicated kernel, filesystem, and network namespace. This is stronger isolation than containers but weaker than V8 isolates in one specific way: multiple requests to the same Fly.io machine share a process, so a memory leak in one tool handler can eventually degrade performance for other requests on the same machine. Mitigate this with health checks that restart machines when memory usage exceeds a threshold, and by running multiple small machines rather than one large one.
Deployment Patterns: CI/CD, Versioning, and Rollback
A production MCP server is software that changes frequently. You will add tools, update descriptions to improve model accuracy, fix bugs in handlers, and adjust rate limits based on real usage. Your deployment pipeline needs to support rapid, safe iteration without breaking existing agent integrations.
CI/CD for MCP Servers
All three platforms integrate with GitHub Actions, which is the simplest way to automate MCP deployments. For Cloudflare Workers, the wrangler-action GitHub Action deploys on every push to main. For Modal, "modal deploy" runs in a GitHub Actions workflow with your MODAL_TOKEN_ID and MODAL_TOKEN_SECRET as repository secrets. For Fly.io, the superfly/flyctl-actions action handles deployment with a FLY_API_TOKEN secret.
A good MCP deployment pipeline includes four stages. First, run unit tests for every tool handler (Vitest for TypeScript, pytest for Python). Second, run integration tests that verify the full MCP protocol flow, including tool discovery, auth, and execution. Third, deploy to a staging environment and run a suite of natural-language test prompts against it using your AI client. Fourth, deploy to production with a canary or traffic-split strategy if your platform supports it.
For teams shipping MCP servers as part of a larger product, the MCP deployment should be decoupled from the main application deployment. MCP tool descriptions and handler logic change on a different cadence than your core product. Deploying them independently lets you iterate on agent behavior without risking regressions in your main app.
Versioning Strategies
MCP does not have a built-in versioning mechanism at the protocol level, so you need to handle it at the deployment layer. The simplest approach is URL-based versioning: deploy each major version at a separate URL (mcp.yourapp.com/v1, mcp.yourapp.com/v2) and let clients pin to a specific version. When you release a breaking change (removing a tool, changing a tool's input schema), bump the major version and deploy to a new URL. Keep the old version running until all clients have migrated.
On Cloudflare Workers, you can use different Worker scripts for each version, routed through custom domains or path-based routing. On Fly.io, deploy separate apps (myapp-mcp-v1, myapp-mcp-v2) or use internal routing rules. On Modal, deploy each version as a separate Modal app with its own endpoint.
For non-breaking changes (adding new tools, updating descriptions, fixing handler bugs), deploy in place. MCP clients discover tools dynamically on connection, so new tools appear automatically without client-side changes. This is one of MCP's best features: you can enhance your server's capabilities without coordinating with every client.
Rollback Procedures
Cloudflare Workers supports instant rollbacks through the dashboard or CLI. Run "wrangler rollback" to revert to the previous deployment. Workers keeps the last 10 deployments available for rollback. This is the fastest rollback story of the three platforms.
Fly.io supports rollback by redeploying a previous Docker image. Run "fly deploy --image registry.fly.io/myapp:previous-tag" to roll back. Fly.io also supports blue-green deployments, where you spin up new machines with the updated code, verify they are healthy, and then shift traffic from old to new machines. If something goes wrong, shift traffic back.
Modal does not have a built-in rollback command, but you can achieve the same result by redeploying a previous git commit. Since Modal builds from your source code on each deploy, checking out a previous commit and running "modal deploy" effectively rolls back. For faster rollbacks, tag known-good deployments in your git history and keep a rollback script that checks out the tag and deploys.
Regardless of platform, always monitor your MCP server for 15 to 30 minutes after a deployment. Watch for increased error rates, slower response times, and changes in tool selection patterns (which can indicate that updated descriptions are confusing the model). If anything looks off, roll back first and investigate second. The cost of running an old version for a few hours is always less than the cost of a broken agent experience in production. For a deeper look at how to structure MCP servers for your specific product, check out our guide to building an MCP server for your product.
Picking the Right Platform and Getting Started
After deploying MCP servers across all three platforms for clients ranging from early-stage startups to enterprise SaaS companies, the decision tree is surprisingly straightforward.
Choose Cloudflare Workers if your MCP server is written in TypeScript, your tool handlers are lightweight (API calls, data formatting, CRUD operations), and latency is your top priority. Workers gives you the best cold start performance, the lowest cost at any scale, and the simplest deployment story. Most MCP servers fall into this category, which is why we recommend Workers as the default starting point.
Choose Modal if your MCP server is Python-based and depends on packages that require a full Linux environment, or if any of your tools need GPU access for inference, image processing, or computation. Modal is the only serverless platform that makes GPU-powered MCP tools practical. The cold start tradeoff is worth it when the alternative is managing your own GPU cluster.
Choose Fly.io if you need persistent processes, long-lived WebSocket connections, full control over your runtime, or the ability to run sidecar services alongside your MCP server. Fly.io is also the best choice when you want multi-region deployment with a traditional application model rather than a serverless functions model. Teams that are already comfortable with Docker and want a "deploy my container globally" experience will feel right at home.
One pattern we use frequently for complex products: split your MCP server across platforms. Host the main MCP endpoint on Cloudflare Workers for speed and low cost. Route compute-heavy tool calls to Modal functions via HTTP. Store persistent state on Fly.io with SQLite. Each platform handles what it does best, and the MCP server coordinates between them. This is more complex to operate, but for products where some tools need sub-50ms response times and others need 30 seconds of GPU compute, it is the right architecture.
Whatever platform you choose, start with a single tool, deploy it, and test it with a real AI agent before building out the full server. MCP hosting problems surface quickly when you are running actual agent workflows, and catching them early saves you from expensive re-architecture later.
If your team is building AI agent tooling and needs help choosing the right hosting stack, designing your MCP server architecture, or deploying to production, we have done this across dozens of products. Book a free strategy call and we will help you map out the hosting strategy that fits your use case, your budget, and your team's operational capacity.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.