---
title: "How to Build an AI Agentic App Using MCP and Tool Use in 2026"
author: "Nate Laquis"
author_role: "Founder & CEO"
date: "2029-05-16"
category: "How to Build"
tags:
  - build AI agentic app
  - MCP tool use
  - AI agent development
  - agentic AI patterns
  - Claude Agent SDK
excerpt: "Forty percent of enterprise apps are expected to feature AI agents by the end of 2026. Here is how to actually build one using MCP, tool calling, and production-grade agentic patterns."
reading_time: "16 min read"
canonical_url: "https://kanopylabs.com/blog/how-to-build-an-ai-agentic-app-with-mcp"
---

# How to Build an AI Agentic App Using MCP and Tool Use in 2026

## Why 2026 Is the Year of AI Agentic Apps

Gartner's projection that 40% of enterprise applications will feature AI agents by the end of 2026 is not a forecast anymore. It is a deployment schedule. Companies across fintech, healthcare, logistics, and SaaS are shipping agents that book meetings, process claims, triage support tickets, and orchestrate multi-step workflows without human intervention. The gap between companies that figure out agentic AI and those that do not is widening fast.

The reason this is happening now comes down to three things converging. First, LLMs got reliable enough at tool calling that you can trust them to pick the right function from 20+ options with over 95% accuracy. Second, [MCP (Model Context Protocol)](/blog/model-context-protocol-mcp-guide) became the universal standard for connecting agents to external tools and data. Third, agent SDKs from Anthropic, OpenAI, and the open-source community matured to the point where you can build a production agent in days, not months.

This guide covers the full stack: MCP server implementation, tool definition schemas, multi-step reasoning chains, context window management, error handling, and production deployment. Just the patterns that work in production right now.

![Developer writing code for an AI agentic application on a laptop screen](https://images.unsplash.com/photo-1461749280684-dccba630e2f6?w=800&q=80)

## MCP Server Implementation: The Foundation of Every Agentic App

Every agentic app starts with MCP servers. An MCP server is the bridge between your AI agent and the real world: your database, your APIs, your third-party SaaS tools. Without MCP, you are writing custom glue code for every integration. With MCP, you write one server per capability and any MCP-compatible agent can use it.

### Setting Up Your First MCP Server

The official MCP SDK is available in TypeScript (@modelcontextprotocol/sdk) and Python (mcp). For most teams building web applications, TypeScript is the natural choice. Your MCP server exposes three primitives: tools (functions the agent can call), resources (data the agent can read), and prompts (templates that guide the agent for specific tasks).

Start by identifying the five to ten core operations your agent needs. If you are building a customer support agent, those might be: search_tickets, get_customer_profile, update_ticket_status, send_reply, escalate_to_human, check_subscription, issue_refund. Each operation becomes an MCP tool with a name, description, and JSON Schema input definition.

### Transport Choices: stdio vs. Streamable HTTP

MCP supports two transports. Stdio is simplest: the client spawns the server as a subprocess and communicates over standard I/O. This works for local development and desktop apps like Claude Desktop. Streamable HTTP (which superseded SSE transport) is what you want for production. It runs your MCP server as a standard HTTP service you can deploy, scale, and load-balance. For cloud-hosted agents, Streamable HTTP is the only practical option.

### Authentication and Security

Production MCP servers need proper auth. The spec supports OAuth 2.0 for remote servers. Validate tokens on every request, scope tool access to user permissions, and never expose admin operations without authorization. Rate limiting matters too: an agent stuck in a retry loop can hammer your API, so enforce per-session and per-tool limits.

## Tool Definition Schemas That Actually Work

The quality of your tool definitions directly determines whether your agent works or fails. A poorly described tool is worse than no tool at all because the agent will call it at the wrong time, with the wrong parameters, and produce confusing results.

### Writing Effective Tool Descriptions

Your tool description needs to answer three questions for the LLM: what does this tool do, when should you use it, and what does each parameter mean? Be specific and opinionated. "Searches the customer database by email, name, or account ID. Use this when the user asks about a specific customer or you need to look up account details before performing an action. Returns the top 5 matching customer records with their subscription status." That description gives the model clear context for selection.

Compare that with a vague description like "Searches for customers." The agent has no idea when to pick this over other search tools, what parameters to provide, or what results to expect.

### Input Schema Design

Keep input schemas flat. LLMs handle simple parameter lists much better than deeply nested objects. Use descriptive parameter names that match natural language. "customer_email" is better than "email" when you have multiple tools that accept email addresses. Mark required vs. optional parameters explicitly. Provide enum values when the set of valid options is known, as this dramatically reduces invalid tool calls.

### Output Schema Best Practices

Return structured data, not raw dumps. If a database query returns 50 columns, pick the 8 that the agent actually needs. Truncate long text fields. Include a total_count alongside paginated results so the agent knows if it needs to fetch more. Always include a clear error message when something fails, along with a suggestion for what the agent should try next. One pattern that works well: return a "next_actions" field that hints at logical follow-up tools, helping the agent chain tools together without relying solely on LLM reasoning.

## Multi-Step Agent Reasoning Chains and the Agentic Loop

A single tool call is not an agent. An agent is defined by the agentic loop: the model receives a task, reasons about what to do, calls a tool, observes the result, decides what to do next, and repeats until the task is complete. This loop is where the real complexity lives.

### The Core Agentic Loop

Every framework implements this cycle: (1) the user provides a goal, (2) the LLM selects the next action, (3) the runtime executes the tool call, (4) the result is appended to context, (5) the LLM evaluates whether the goal is met, (6) if not, loop to step 2. Simple tasks take 2 to 3 loops. Complex tasks (investigating incidents, generating reports) can take 10 to 15.

### Planning vs. ReAct Patterns

The ReAct pattern (Reasoning + Acting) is the most common: the model thinks step by step, picks one action, executes it, observes the result, and reasons again. Claude Agent SDK and OpenAI Agents SDK use this by default. It is simple, debuggable, and works for most use cases.

The planning pattern is more sophisticated. The agent generates a full plan upfront, then executes each step, revising if something unexpected happens. LangGraph supports this natively through its graph-based state machine. Planning works better for long workflows where you want user approval before execution begins.

![Network infrastructure representing multi-step AI agent reasoning chains and data flow](https://images.unsplash.com/photo-1558494949-ef010cbdcc31?w=800&q=80)

### When to Use Parallel Tool Calls

Most agent SDKs support parallel tool calling, where the model requests multiple tool calls in a single turn. This is a significant performance optimization. If the agent needs to look up a customer AND check their subscription AND fetch recent tickets, those three calls can happen simultaneously instead of sequentially. Claude 4 and GPT-4.1 both support parallel tool calls. Use them aggressively for independent data fetches. Avoid them for operations with dependencies (do not update a record and then read it in parallel).

## Claude Agent SDK vs. OpenAI Agents SDK vs. LangGraph

Choosing the right SDK shapes your entire development experience. Each option has a distinct philosophy, and picking the wrong one will cost you weeks of refactoring.

### Claude Agent SDK

Anthropic's Claude Agent SDK is the most opinionated of the three. It manages the entire agentic loop for you: tool execution, context management, error recovery, and multi-turn conversations. You define your tools (or point it at MCP servers), give it a system prompt, and it handles the rest. The SDK supports guardrails natively, letting you define input and output validation rules that run on every turn. It also has built-in support for handoffs between agents, making multi-agent architectures straightforward.

The biggest strength is simplicity. You can go from zero to a working agent in under 100 lines of code. The tradeoff is less control over the execution flow. If you need custom logic between tool calls (approval gates, human-in-the-loop checkpoints, custom retry logic), you are working against the SDK's abstractions rather than with them. Best for: teams that want to ship fast, use Claude as their primary model, and do not need complex orchestration.

### OpenAI Agents SDK

OpenAI's Agents SDK (formerly Swarm) provides Agent, Runner, and Handoff primitives with built-in tracing for debugging multi-step executions.

Its Responses API integration gives you streaming, tool calling, and structured output in a single endpoint. If your stack is already on OpenAI (GPT-4.1, embeddings, fine-tuned models), it integrates smoothly. The weakness is vendor lock-in: no clean abstractions for swapping in Claude or Gemini. Best for: teams committed to OpenAI who want tight integration with GPT-4.1 and o3.

### LangGraph

LangGraph is the power tool. It models agents as directed graphs where nodes are computation steps (LLM calls, tool calls, custom functions) and edges define the flow between them. You get full control over execution order, branching, looping, and state management. LangGraph is model-agnostic, working equally well with Claude, GPT-4, Gemini, and open-source models.

The tradeoff is complexity. Building a simple agent in LangGraph takes significantly more code than Claude Agent SDK or OpenAI Agents SDK. But for [complex agentic workflows](/blog/agentic-ai-workflows-guide) with conditional branching, parallel execution paths, human-in-the-loop approvals, and persistent state, LangGraph is the right choice. It also has the best story for long-running agents (workflows that span hours or days) through its built-in checkpointing and state persistence. Best for: teams building complex, multi-model agents that need fine-grained control over execution flow.

## Context Window Management and Cost Control

Context window management is the unglamorous problem that kills agents in production. Every tool call result gets appended to the conversation context. After 10 tool calls, you might have 15,000 tokens of tool results consuming your context window. After 20 calls, you are pushing 30,000+ tokens. At that point, performance degrades, costs spike, and the model starts forgetting earlier parts of the conversation.

### Strategies for Managing Context

The first strategy is aggressive summarization. After every 5 to 8 tool calls, have a secondary LLM call (or use a smaller, cheaper model like Haiku) to summarize the results so far into a concise paragraph. Replace the individual tool results with this summary. This keeps the context window lean while preserving the important information.

The second strategy is selective inclusion. Not every tool result needs to stay in context. If the agent searched for a customer and found them, the search results are no longer needed once the customer profile is loaded. Build logic that prunes tool results that have been "consumed" by subsequent actions.

The third strategy is tiered context. Keep the system prompt and current task description at the top (always in context). Keep the last 3 to 5 tool call/result pairs in full detail. Summarize everything older. This gives the agent enough recent context to reason while keeping total token count manageable.

### Cost Math You Need to Know

A typical agentic interaction with 8 tool calls costs roughly $0.03 to $0.08 with Claude Sonnet 4, depending on context length. That seems cheap until you multiply by thousands of daily users. At 10,000 agent sessions per day with an average cost of $0.05, you are spending $500/day or $15,000/month on LLM API costs alone. Context window management is not just a technical concern. It is a cost control mechanism. Reducing average context length by 40% through summarization and pruning can save $6,000/month at that scale. For more on managing these costs, see our guide on [AI agent SDKs](/blog/ai-agent-sdks-claude-openai-langgraph).

![Team collaborating on AI agent architecture and deployment strategy in an office](https://images.unsplash.com/photo-1504384308090-c894fdcc538d?w=800&q=80)

## Error Handling, Fallbacks, and Guardrails

Agents fail. Tools time out, APIs return 500 errors, the LLM hallucinates a tool that does not exist, or the model gets stuck in an infinite loop calling the same tool with the same bad parameters. Production agents need robust error handling at every layer.

### Tool-Level Error Handling

Every MCP tool should return structured errors, not just stack traces. When a tool fails, the response should include an error code, a human-readable message, and a suggestion for the agent. For example: "Error: customer_not_found. No customer exists with email john@example.com. Try searching by name or account ID instead." This gives the agent enough context to self-correct on the next loop iteration.

Set timeout limits on every tool call. A database query that takes 30 seconds is not going to take less time if the agent retries it. Default to 10-second timeouts for most tools, 30 seconds for tools that involve heavy computation or external API calls, and 60 seconds maximum for anything.

### Loop-Level Guardrails

Set a hard maximum on the number of agentic loop iterations. 25 is a reasonable default. If the agent has not completed the task in 25 tool calls, something is wrong. Return a graceful failure message to the user rather than letting the agent burn through tokens indefinitely. Also monitor for repetition: if the agent calls the same tool with the same parameters three times in a row, break the loop and ask for human input.

### Model-Level Fallbacks

Build a fallback chain for model failures. If Claude Opus 4 returns a rate limit error, fall back to Sonnet 4. If that fails, fall back to GPT-4.1. MCP keeps your tool definitions model-agnostic, making this straightforward. The cost differences also make fallback chains a smart optimization: use Opus for complex reasoning, Sonnet for tool-calling sequences, and Haiku for summarization.

### Human-in-the-Loop Checkpoints

For high-stakes operations (processing refunds, deleting data, sending external emails), add a confirmation step where the agent presents its intended action and waits for approval. Claude Agent SDK supports this through handoffs. LangGraph supports it through interrupt nodes. Build these checkpoints in from day one.

## Production Deployment Patterns

Getting an agent working in development is the easy part. Running it reliably in production, at scale, with real users, is where most teams struggle.

### Architecture for Scale

Deploy your MCP servers as independent microservices. Your agent runtime (the agentic loop manager) should be a separate service connecting to MCP servers over Streamable HTTP. This lets you scale each component independently. A database MCP server might need 10 instances while an email MCP server only needs 2.

Use a message queue (Redis Streams, SQS, or Kafka) between the user-facing API and the agent runtime. Agent tasks take 10 to 60 seconds. Accept the task, return a job ID, and deliver results via WebSocket.

### Observability and Tracing

You cannot debug an agent without tracing. Every loop iteration should produce a trace event: the model's reasoning, the tool selected, input parameters, the tool result, and the model's interpretation. Tools like Langfuse, Braintrust, and Arize Phoenix provide purpose-built LLM tracing. OpenTelemetry works too if you prefer your existing stack.

Track six key metrics: task completion rate, average loop iterations per task, latency per session, cost per session, tool error rate by name, and fallback trigger rate. These tell you if your agent is working, how efficiently, and where it is failing.

### Testing Agentic Applications

Traditional unit tests do not work well for agents because LLM outputs are non-deterministic. Use evaluation-based testing instead. Define 50 to 100 test scenarios with expected outcomes, run each multiple times, and measure pass rates. Aim for 90%+ consistency on critical paths. LLM evaluation frameworks (Braintrust, Promptfoo, DeepEval) automate this. Mock your MCP servers during testing with deterministic implementations that return predefined responses, letting you test reasoning and tool selection without relying on live APIs.

## Real-World Use Cases and What to Build First

If you are reading this and wondering where to start, here are the use cases where AI agents are delivering the most value right now.

### Customer Support Agents

This is the highest-ROI starting point for most companies. An agent that can look up customer accounts, check order status, process simple refunds, and escalate complex issues to humans. The MCP server setup is straightforward: one server for your CRM, one for your order management system, one for your ticketing platform. Companies like Klarna and Intercom are already running agents that handle 60 to 80% of support volume autonomously.

### Internal Operations Agents

Agents that help employees navigate internal tools. "What is the status of invoice #4521?" "Create a PTO request for next Friday." These agents connect to internal databases, HR systems, and BI tools via MCP servers, reducing the time employees spend context-switching between apps.

### Developer Productivity Agents

Agents that assist with code review, incident response, and deployment. Connect MCP servers for GitHub, your CI/CD pipeline, and your monitoring stack (Datadog, Sentry). The agent can triage alerts, pull relevant logs, identify the likely root cause, and draft a fix. Claude Code and Cursor are early examples, but you can build agents tailored to your specific stack.

### Data Analysis Agents

Agents that answer business questions by querying databases and generating reports. "What was our churn rate last month compared to the previous quarter?" The agent translates natural language into SQL, executes queries, and presents results with commentary. This democratizes data access across the organization.

### Start Small, Iterate Fast

Do not try to build a general-purpose agent that does everything. Pick one use case, define 5 to 10 tools, build an MCP server for each integration, and test it with real users. Get the first use case to 90%+ reliability before expanding. Horizontal expansion is much easier once your core agentic loop is proven.

If you are ready to build your first AI agentic application and want a team that has done this before, [book a free strategy call](/get-started) with us. We will help you pick the right architecture, choose the right SDK, and avoid the mistakes we have seen trip up dozens of teams.

---

*Originally published on [Kanopy Labs](https://kanopylabs.com/blog/how-to-build-an-ai-agentic-app-with-mcp)*