Technology·14 min read

How to Build and Deploy Custom MCP Servers for AI Products

MCP servers are the connective tissue between AI agents and the real world. This guide walks through building, testing, and deploying custom MCP servers from scratch.

Nate Laquis

Nate Laquis

Founder & CEO

Why Custom MCP Servers Are the New API Layer

Every AI product eventually hits the same wall: the LLM needs to do something in the real world. It needs to read from your database, call your internal API, update a record in Salesforce, or trigger a deployment pipeline. Before MCP, every one of those integrations was a bespoke piece of glue code. You would wire up function calling schemas, build custom tool executors, handle auth on a per-tool basis, and pray that the next model version did not break your parameter parsing. It worked, but it did not scale.

MCP changed the game by giving us a universal protocol for tool connectivity. But here is the thing most teams get wrong: they treat MCP as a configuration problem rather than an engineering one. They grab an off-the-shelf MCP server, plug it in, and wonder why their agent is unreliable. The reality is that production-grade AI products need custom MCP servers built specifically for their domain, their data model, and their security requirements.

If you have read our MCP overview for CTOs, you understand the protocol architecture. This guide goes deeper. We are going to build MCP servers from scratch in TypeScript and Python, define tools and resources that LLMs can actually use reliably, implement proper authentication, deploy to Cloudflare Workers and AWS Lambda, test everything with MCP Inspector, and publish to registries so other developers can discover your servers.

Developer writing custom MCP server code in a modern IDE with multiple files open

This is not a conceptual overview. It is a hands-on custom MCP server development guide for engineers who are building AI products and need their agents to interact with proprietary systems. By the end, you will have a clear blueprint for building MCP servers that are reliable, secure, and ready for production traffic.

Building an MCP Server in TypeScript

TypeScript is the most popular language for MCP server development, and for good reason. The official @modelcontextprotocol/sdk package gives you type-safe tool definitions, built-in transport handling, and first-class support for both stdio and Streamable HTTP. If your team already works in Node.js or builds with frameworks like Next.js, TypeScript is the natural choice.

Project Setup and Dependencies

Start by initializing a new project and installing the MCP SDK. You need @modelcontextprotocol/sdk as your core dependency and zod for input validation schemas. The SDK uses Zod internally for schema validation, so you get runtime type checking for free. Your package.json should specify "type": "module" since the SDK uses ES modules. Add a tsconfig.json targeting ES2022 with module resolution set to "bundler" or "nodenext" depending on your deployment target.

The entry point for every MCP server is the McpServer class. You instantiate it with a name, version string, and optional configuration. The server object is what you attach tools, resources, and prompts to. Then you connect it to a transport. For local development and testing, StdioServerTransport is the fastest way to get running. For production, you will want StreamableHTTPServerTransport, which we will cover in the deployment section.

Defining Your First Tool

Tools are the core primitive. Each tool has a name, a description, a Zod schema for inputs, and an async handler function that executes the operation. The name should be snake_case and descriptive: "search_customers" not "search" or "customerSearch." The description is critical because this is what the LLM reads to decide when to use the tool. Write it like you are explaining the function to a junior developer who has never seen your codebase. Include what it does, when to use it, what it returns, and any important constraints.

Your Zod schema defines every parameter. Use z.string(), z.number(), z.enum(), and z.boolean() for flat structures. Avoid deeply nested z.object() schemas. LLMs produce significantly fewer errors with flat parameter lists. If a parameter accepts only certain values, use z.enum() instead of z.string(). This guides the model toward valid inputs and eliminates an entire class of runtime errors.

Handling Tool Execution

The handler function receives the validated input and returns a result object with a "content" array. Each content item has a type (usually "text") and a body. For structured data, serialize it as JSON inside the text field. Always handle errors gracefully inside the handler. Do not let exceptions bubble up unhandled. Return an error message in the content with isError set to true. This tells the LLM that something went wrong and it should try a different approach rather than retrying blindly.

One pattern that dramatically improves agent reliability: include actionable context in your responses. Instead of returning just { "status": "not_found" }, return something like "No customer found with email john@example.com. Try searching by name or account ID instead." This gives the LLM explicit guidance on what to do next, reducing wasted reasoning cycles.

Building an MCP Server in Python

Python is the second officially supported language for MCP development, and it is the better choice if your backend is already Python-based (Django, FastAPI, Flask) or if your MCP server needs to interact with data science and ML libraries. The mcp package on PyPI provides the same capabilities as the TypeScript SDK with a Pythonic API.

Setup with the FastMCP Pattern

The Python SDK introduced FastMCP, a high-level API inspired by FastAPI's decorator pattern. Instead of manually registering tools on a server object, you decorate plain Python functions with @mcp.tool(). The SDK automatically extracts the function name, docstring, and type hints to generate the tool definition. This is genuinely elegant. Your tool definitions stay close to the implementation, and Python type hints (str, int, Optional, Literal) map directly to JSON Schema types.

Install the SDK with pip install mcp or, better, use uv for faster dependency resolution. Create a server.py file, instantiate FastMCP with a name, and start decorating functions. The docstring becomes the tool description, so write it carefully. Parameters with type hints become the input schema. Default values mark parameters as optional. Literal types map to enums. It all just works.

When Python Makes More Sense

Choose Python when your MCP server needs to call internal Python services, run pandas or numpy operations on data before returning results, interact with ML models via PyTorch or Hugging Face, or interface with Python-heavy infrastructure like Airflow, Celery, or SQLAlchemy. The overhead of calling Python from a TypeScript MCP server (via HTTP or subprocess) adds latency and complexity. If the logic is in Python, keep the MCP server in Python too.

Async Handlers and Performance

The Python SDK supports both sync and async handlers. For any I/O-bound work (database queries, HTTP requests, file reads), use async handlers with asyncio. The SDK runs on anyio, so it works with both asyncio and trio. If your tool handler makes three independent API calls, use asyncio.gather() to run them concurrently. This can cut response times by 60-70% compared to sequential execution, which matters when your agent is chaining multiple tool calls in a single reasoning loop.

For CPU-bound work (data processing, calculations), async will not help. Consider offloading heavy computation to a background task queue and returning a status URL that the agent can poll. This keeps your MCP server responsive and prevents one expensive tool call from blocking all others.

Resources, Prompts, and Advanced Tool Patterns

Tools get all the attention, but MCP servers expose two other primitives that are equally important for building reliable AI products: resources and prompts. Getting these right is the difference between an agent that fumbles through tasks and one that executes with precision.

Resources: Giving the Agent Context

Resources are read-only data endpoints that the agent can pull into its context window. Unlike tools, resources do not perform actions. They provide information. A resource might expose a customer's profile, a list of recent orders, a configuration file, or documentation for your API. Resources use URI templates (like "customers://{customer_id}/profile") so the agent can request specific data.

The key design decision with resources is granularity. Do not expose a single "get_everything" resource that dumps thousands of tokens into context. Break resources into focused, composable pieces. A customer resource should return the profile. A separate resource returns their order history. Another returns their support ticket history. Let the agent request only what it needs for the current task. This keeps context window usage efficient and prevents the LLM from getting distracted by irrelevant information.

Prompts: Guiding Agent Behavior

MCP prompts are server-defined templates that structure how the agent approaches specific tasks. If you have a "generate_report" workflow that requires pulling data from three resources and formatting it in a specific way, encode that as a prompt. The agent (or the user) selects the prompt, the server fills in the template with context, and the LLM follows the structured guidance.

Prompts are underused by most teams, but they are incredibly powerful for complex workflows. They reduce the burden on system-level prompt engineering by embedding domain-specific instructions directly in the MCP server. This means the same generic AI client can execute sophisticated domain workflows just by selecting the right MCP prompt.

Multi-Step Tool Patterns

Some operations require multiple steps that should execute as a unit. Consider a "transfer_funds" workflow: validate the source account, check the balance, validate the destination, execute the transfer, and record the transaction. You have two options. First, expose each step as a separate tool and let the agent orchestrate them. This is flexible but risky because the agent might skip a validation step. Second, expose a single "transfer_funds" tool that handles all steps internally and returns a comprehensive result. For anything involving data integrity, financial operations, or irreversible actions, the second approach is safer. Let the MCP server own the business logic and use the agent for user interaction and decision-making.

Code on a monitor showing MCP server tool and resource definitions in a TypeScript project

Authentication, Authorization, and Security

Shipping an MCP server without proper auth is like deploying an API with no authentication. It will work in development, and it will be a disaster in production. MCP servers are attack surfaces. They give AI agents the ability to read data and perform actions in your systems. Lock them down properly from day one.

OAuth 2.0 for Remote Servers

The MCP specification defines OAuth 2.0 as the standard authentication mechanism for remote (HTTP-based) servers. Your MCP server acts as both a resource server and, optionally, an authorization server. The flow works like this: the MCP client initiates a connection, your server responds with a 401 and an OAuth discovery URL, the client walks the user through the OAuth flow (browser redirect, consent screen, callback), obtains an access token, and includes it in subsequent requests. The official SDK provides middleware helpers for validating Bearer tokens on incoming requests.

For internal tools where you control both the client and server, API keys are simpler. Pass the key as a header in the Streamable HTTP transport configuration. Validate it on every request. Rotate keys regularly and never embed them in client-side code.

Per-Tool Authorization

Not every authenticated user should have access to every tool. Implement per-tool authorization by checking the user's roles or permissions before executing the tool handler. A support agent might have access to "search_tickets" and "update_ticket_status" but not "delete_customer" or "issue_refund_over_500." Store permission mappings in your database or identity provider. Check them on every tool call. This is not optional for any system handling customer data or financial operations.

Input Validation and Rate Limiting

Zod schemas (TypeScript) and type hints (Python) validate the structure of inputs, but you also need business logic validation. A tool that accepts a date range should reject ranges spanning more than 90 days. A search tool should enforce a maximum result count. A mutation tool should validate that the referenced entity exists before modifying it. These checks prevent agents from making expensive or destructive mistakes.

Rate limiting is equally critical. An agent stuck in a retry loop can fire hundreds of tool calls per minute. Implement per-session rate limits (e.g., 60 tool calls per minute per session) and per-tool limits for expensive operations (e.g., 5 report generations per hour). Return clear rate-limit error messages so the agent understands why the call failed and can back off gracefully. Consider using libraries like Arcjet or Unkey for API-level rate limiting if you do not want to build it yourself. Our guide on building agentic apps with MCP covers the client-side patterns for handling these limits in agent loops.

Deploying MCP Servers to Cloudflare Workers and AWS Lambda

Building the MCP server is half the job. Deploying it so it is fast, scalable, and cost-effective is the other half. The two best options in 2029 are Cloudflare Workers and AWS Lambda, and each has distinct advantages depending on your use case.

Cloudflare Workers: The Edge-First Option

Cloudflare Workers is the simplest path to deploying a production MCP server. Workers run at the edge across 300+ locations globally, which means sub-50ms latency for most users. The platform supports the Streamable HTTP transport natively, and Cloudflare has published official MCP server templates that handle transport setup, session management, and Durable Objects for stateful connections.

The deployment flow is straightforward. Write your MCP server using the TypeScript SDK, configure wrangler.toml with your worker settings, and run "wrangler deploy." Your server is live at a workers.dev URL within seconds. For custom domains, add a route in your Cloudflare DNS settings. Workers support environment variables and secrets natively, so API keys and database credentials stay out of your code.

Cloudflare's Durable Objects solve one of the trickiest problems with serverless MCP deployment: session state. MCP connections are stateful (the client and server maintain a session), but serverless functions are stateless by default. Durable Objects give you a persistent, single-threaded execution context tied to a session ID. Each MCP session gets its own Durable Object that maintains state across requests. This is the cleanest solution for stateful MCP on serverless infrastructure.

The constraint is runtime compatibility. Workers use the V8 runtime, not Node.js. If your MCP server depends on Node.js-specific APIs (fs, child_process, native modules), you will need to refactor or use Cloudflare's Node.js compatibility layer, which covers most common APIs but not all.

AWS Lambda: The Enterprise Option

AWS Lambda is the right choice when your MCP server needs to access resources inside a VPC (RDS databases, ElastiCache, internal APIs), when you need fine-grained IAM permissions, or when your organization is already deep in the AWS ecosystem. Lambda supports both TypeScript and Python MCP servers without modification.

For HTTP transport, front your Lambda with an API Gateway (HTTP API, not REST API, for lower latency and cost). Configure a single route that proxies all requests to your Lambda function. The Lambda handler initializes the MCP server, creates a StreamableHTTPServerTransport, and routes incoming requests. Since Lambda functions are stateless, you need external state management for MCP sessions. DynamoDB is the natural choice: store session state keyed by session ID with a TTL for automatic cleanup.

Data center server infrastructure used for deploying production MCP servers at scale

Cold starts are the main concern with Lambda for MCP. A cold start adds 500ms to 2 seconds to the first request in a session. For MCP servers that handle frequent, latency-sensitive tool calls, this is painful. Mitigate it by using provisioned concurrency (keeps warm instances ready) or Lambda SnapStart (for Java/Python, reduces cold start to under 200ms). For TypeScript, keep your bundle size small and avoid heavy dependencies that slow initialization.

Which to Choose

Use Cloudflare Workers when latency matters, your server is TypeScript, and you do not need VPC access. Use AWS Lambda when you need VPC connectivity, your backend is on AWS, or you need Python with heavy dependencies (numpy, pandas, ML libraries). Both are cost-effective for MCP workloads because tool calls are short-lived requests, exactly what serverless is designed for.

Testing with MCP Inspector and Debugging Strategies

You cannot ship an MCP server you have not tested, and testing MCP servers is different from testing regular APIs. The consumer is an LLM, not a human, which means edge cases show up in unexpected places. A typo in a tool description might cause the model to never select that tool. A missing enum value might cause the agent to hallucinate a parameter. You need tools and strategies specifically designed for MCP testing.

MCP Inspector: Your Primary Testing Tool

MCP Inspector is the official debugging tool for MCP servers. It provides a web-based UI where you can connect to any MCP server, browse its tools, resources, and prompts, and execute tool calls manually. Think of it as Postman for MCP. Install it globally with "npx @modelcontextprotocol/inspector" and point it at your server. For stdio servers, it spawns the process automatically. For HTTP servers, provide the URL.

The Inspector shows you exactly what the client sees: tool names, descriptions, input schemas, and raw JSON-RPC messages. This is invaluable for catching problems early. Check that every tool description reads clearly. Verify that input schemas match your expectations. Execute each tool with valid and invalid inputs. Confirm that error responses include helpful messages. If something looks wrong in the Inspector, it will definitely confuse an LLM.

Automated Testing Strategies

Beyond the Inspector, build automated tests for your MCP server. The simplest approach is unit testing your tool handlers directly. They are just functions that accept validated input and return content arrays. Test them with standard testing frameworks (Vitest for TypeScript, pytest for Python). Mock external dependencies (database, APIs) and verify that handlers return correct results for normal inputs, edge cases, and error conditions.

Integration tests should verify the full MCP protocol flow. Instantiate your server in-process, connect a test client, and run tool calls through the actual MCP transport. The TypeScript SDK provides a Client class you can use for this. Verify that tool discovery works, that auth rejects invalid tokens, that rate limits trigger correctly, and that concurrent tool calls do not interfere with each other.

Testing with Real AI Clients

Automated tests catch implementation bugs, but they do not tell you whether an LLM will use your tools correctly. For that, you need to test with real AI clients. Connect your MCP server to Claude Desktop or Cursor and give the AI tasks that require using your tools. Watch for common problems: the model selecting the wrong tool, providing invalid parameters, misinterpreting results, or getting stuck in loops. These issues almost always trace back to unclear tool descriptions or confusing response formats, not bugs in the code itself.

Build a test suite of 20 to 30 natural-language tasks that cover your tool surface area. Run them periodically, especially after updating tool descriptions or adding new tools. Track success rates over time. If a particular tool falls below 90% accuracy, revisit its description and schema. This is the closest thing to a CI pipeline for AI-tool integration, and it catches regressions that no other testing approach will find.

Publishing to MCP Registries and What Comes Next

Building a great MCP server is valuable on its own, but publishing it to a registry multiplies that value by making it discoverable to every AI client and developer in the ecosystem. MCP registries are still maturing, but the infrastructure is solidifying fast and early publishers get a significant distribution advantage.

Registry Options in 2029

The primary registries today are the official MCP Server Registry (maintained by the MCP project), Smithery (the largest community registry with over 3,000 servers), and platform-specific registries like Cloudflare's MCP catalog and Anthropic's featured integrations. Each registry has its own submission process, but they all require the same basics: a valid MCP server manifest, clear documentation, a public repository, and passing validation tests.

Your server manifest (typically mcp.json or declared in your package.json) describes the server name, version, transport type, available tools and resources, required authentication, and configuration options. Keep this accurate and up to date. Registries use it to generate discovery listings, and AI clients use it to determine compatibility before connecting.

Documentation That Gets Adoption

The MCP servers that get the most adoption share a few documentation traits. They include a clear one-paragraph description of what the server does and who it is for. They list every tool with its description, parameters, and example inputs and outputs. They provide quick-start instructions for both stdio (local development) and HTTP (production) transports. They document authentication requirements with step-by-step setup instructions. And they include at least three real-world usage examples showing natural-language prompts and the resulting tool calls.

Treat your MCP server documentation like API documentation. Developers (and increasingly, AI agents) will evaluate your server based on how quickly they can understand what it does and get it working. Clear docs convert browsers into users.

Versioning and Backward Compatibility

Once your MCP server has users, you cannot break them. Follow semantic versioning strictly. Adding new tools is a minor version bump. Changing a tool's input schema or behavior is a major version bump. Removing a tool is a major version bump. Registries track versions, and clients can pin to specific versions, but only if you version correctly. Include a CHANGELOG that documents every change, especially breaking ones.

What Comes Next for MCP

The MCP ecosystem is evolving rapidly. Expect to see better tooling for monitoring MCP server performance in production (request latency, error rates, tool selection accuracy). Agent-to-agent communication via protocols like A2A will complement MCP by enabling multi-agent architectures where specialized agents each expose their own MCP servers. And the line between MCP servers and traditional APIs will continue to blur as more frameworks add native MCP support.

The teams that invest in custom MCP server development now are building a durable competitive advantage. Your proprietary tools, data, and workflows become accessible to every AI agent in the ecosystem. That is not just a technical integration. It is a distribution strategy.

If you are building an AI product and need custom MCP servers that connect your agents to proprietary systems, our team has shipped MCP infrastructure for companies across fintech, healthcare, and enterprise SaaS. Book a free strategy call and we will map out the MCP architecture for your product.

Need help building this?

Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.

custom MCP server development guideModel Context ProtocolMCP server TypeScriptAI tool integrationMCP deployment

Ready to build your product?

Book a free 15-minute strategy call. No pitch, just clarity on your next steps.

Get Started