What Gemini Managed Agents Actually Are
Google's Managed Agents API is the hosted runtime layer that sits on top of the Agent Development Kit (ADK). You define your agent logic, tools, and orchestration patterns in the ADK, then deploy the whole thing as a managed service on Vertex AI. Google handles scaling, session persistence, authentication, and model versioning. You focus on the agent's behavior, not the infrastructure underneath it.
This matters because most teams that build agents with open-source frameworks end up spending 40 to 60 percent of their engineering time on infrastructure: load balancing tool calls, managing conversation state across sessions, handling retries when the LLM times out, and keeping multiple agent instances in sync. Managed Agents eliminates that entire category of work.
The API exposes a straightforward interface. You create an agent resource with a model configuration (Gemini 2.5 Pro, Gemini 2.5 Flash, or custom fine-tuned variants), attach tool definitions, set orchestration rules, and deploy. From that point, every request goes through Google's infrastructure with built-in observability, automatic failover, and usage-based billing.
If you have been following the comparison between Claude Agent SDK, OpenAI Agents SDK, and Google ADK, think of Managed Agents as Google's answer to a question the other two SDKs leave unanswered: how do you run agents in production without managing servers?
Agent Architecture and Core Concepts
Managed Agents are built around four primitives: the Agent, Tools, Sessions, and Orchestration. Understanding how these interact is the difference between a working prototype and a production system that falls over at 100 concurrent users.
The Agent Resource
An agent resource is a versioned configuration object stored on Vertex AI. It includes the model ID (e.g., gemini-2.5-pro), the system instruction that defines the agent's persona and constraints, a list of attached tools, and orchestration settings. You can create agents programmatically through the REST API or the Python/Node.js client libraries. Each agent gets a unique resource name that you reference when sending requests.
Tools: Functions, Extensions, and Data Stores
Tools in the Managed Agents API fall into three categories. Function tools are custom code you write and host, exposed to the agent via typed schemas. The agent decides when to call them, and your code executes the logic. Extension tools connect to Google-managed services like Google Search, Code Interpreter, or Vertex AI Search without any code on your end. Data store tools let the agent query structured and unstructured data in BigQuery, Cloud SQL, or Vertex AI Search indexes. The power here is combining all three in a single agent. A customer support agent can search your knowledge base (data store tool), look up order details (function tool), and run Google Search for product comparisons (extension tool), all in one conversation turn.
Sessions and State
Unlike the stateless request-response pattern you get with raw Gemini API calls, Managed Agents maintain session state. A session tracks the full conversation history, tool call results, and any custom metadata you attach. Sessions persist across requests, so your agent can reference earlier parts of the conversation without you stuffing the entire history into every prompt. Sessions expire after a configurable TTL (default 24 hours), and you can store up to 1 million tokens of context per session.
Orchestration Modes
Google offers two orchestration modes. Single-agent mode is what you would expect: one agent handles the full request, calling tools as needed. Multi-agent mode lets you define a root agent that delegates to specialist sub-agents, each with their own tools and system instructions. The root agent decides which sub-agent handles each part of the request and synthesizes their responses. This maps closely to how we build multi-agent AI systems in practice.
Tool Use and Multi-Turn Conversations
Tool use is where Managed Agents earn their keep. The API handles the full tool-use loop automatically: Gemini decides which tool to call, the runtime executes it, the result feeds back to the model, and the model decides whether to call another tool or respond. You never write loop logic yourself.
Defining Function Tools
Function tools use OpenAPI-style schemas. You define the function name, description, and parameter types. The description matters more than most developers realize. Gemini uses it to decide when to call the tool, so vague descriptions like "gets data" lead to unreliable tool selection. Be specific: "Retrieves the shipping status for a given order ID, returning the carrier name, tracking number, and estimated delivery date." That precision improves tool call accuracy by 20 to 30 percent in our testing.
Parallel and Sequential Tool Calls
Gemini 2.5 Pro supports parallel function calling. If the agent needs data from two independent sources, it calls both tools simultaneously rather than waiting for the first to complete before calling the second. This cuts latency for multi-tool requests roughly in half. You control this behavior with the tool configuration: set parallel calling to automatic (the model decides), forced, or disabled.
Multi-Turn Conversation Flow
In a multi-turn conversation, the agent maintains context across turns. Turn one: the user asks about their account balance. The agent calls the account lookup tool. Turn two: the user asks to transfer funds. The agent already knows the account ID from the previous turn, so it calls the transfer tool directly without asking for the account number again. Turn three: the user asks for a confirmation. The agent references the transfer result from turn two. This stateful behavior is automatic when you use sessions. Without Managed Agents, you would build this state management yourself, and it would break in subtle ways when sessions expire or context windows fill up.
Grounding with Google Search
One unique advantage: you can enable Google Search grounding on any managed agent. The agent automatically searches the web when it lacks the information to answer a question, and it cites its sources in the response. For customer-facing agents that need to provide accurate, up-to-date information, this is genuinely useful and not something you get from Claude or OpenAI agent SDKs without building a search integration yourself.
Vertex AI Integration and Deployment
Deploying a managed agent on Vertex AI takes less time than most teams expect. The real work is in configuring it correctly for production traffic.
Setting Up Your First Agent
The deployment flow follows four steps. First, create a Vertex AI project and enable the Agent Engine API. Second, define your agent configuration: model, system instruction, tools, and orchestration mode. Third, deploy the agent with a single API call or through the Google Cloud Console. Fourth, send requests to the agent endpoint. The entire process takes about 30 minutes for a simple agent, assuming your tools are already built.
Infrastructure You Do Not Manage
Google handles autoscaling, load balancing, model hosting, and session storage. Your agent scales from zero to thousands of concurrent sessions without configuration changes. There is no cold start penalty for managed agents since Google keeps model instances warm. Failover is automatic: if the underlying infrastructure has an issue, requests route to healthy instances without interruption.
IAM and Security
Managed Agents integrate with Google Cloud IAM. You control who can invoke the agent, who can modify its configuration, and which Google Cloud resources the agent can access through its service account. For enterprise deployments, this means your agent's permissions follow the same governance model as the rest of your cloud infrastructure. VPC Service Controls can restrict the agent to your private network, and Customer-Managed Encryption Keys (CMEK) encrypt session data with your own keys.
Monitoring and Observability
Every agent interaction generates traces in Cloud Trace and logs in Cloud Logging. You can track latency per tool call, success/failure rates, token usage per session, and model response quality metrics. Set up Cloud Monitoring alerts for error rates above your threshold or latency spikes that indicate tool performance degradation. This level of observability comes built in. With self-hosted frameworks, you would spend weeks integrating OpenTelemetry and building custom dashboards.
Pricing: Managed Agents vs Claude Agent SDK vs OpenAI Agents SDK
Pricing for agent workloads is more nuanced than simple per-token costs. You need to account for the model cost, the infrastructure cost, and the tool execution cost. Here is how Managed Agents compare to the alternatives.
Gemini Managed Agents Cost Breakdown
The model cost depends on which Gemini variant you use. Gemini 2.5 Pro runs approximately $1.25 per million input tokens and $10.00 per million output tokens at the time of writing (with Vertex AI pricing). Gemini 2.5 Flash is significantly cheaper at roughly $0.15 per million input tokens and $0.60 per million output tokens. For most agent workloads, Flash handles the job well, and Pro is reserved for tasks requiring deeper reasoning. There is no additional infrastructure charge for the managed runtime itself. You pay for model tokens, tool executions (if using Google-hosted extensions), and session storage beyond the free tier.
Claude Agent SDK Costs
Claude Sonnet runs $3 per million input tokens and $15 per million output tokens. Claude Haiku is $0.25/$1.25. The SDK itself is free, but you host the agent runtime yourself, meaning you pay for compute (EC2, Cloud Run, or similar) on top of model costs. A typical production agent on AWS adds $200 to $800 per month in compute and infrastructure costs depending on traffic volume.
OpenAI Agents SDK Costs
GPT-4o runs $2.50 per million input tokens and $10.00 per million output tokens. GPT-4o-mini is $0.15/$0.60. Like Claude, you self-host the runtime. Same infrastructure overhead applies.
Real-World Cost Comparison
For a customer support agent handling 10,000 conversations per month with an average of 5 turns per conversation and 2 tool calls per turn, here is what you would pay approximately. Gemini 2.5 Flash on Managed Agents: $150 to $300 per month (model only, no infrastructure cost). Claude Sonnet self-hosted: $400 to $800 per month (model) plus $300 to $500 per month (infrastructure). GPT-4o self-hosted: $350 to $700 per month (model) plus $300 to $500 per month (infrastructure). The managed infrastructure savings alone make Gemini Managed Agents 30 to 50 percent cheaper at this scale. At higher volumes, the gap widens because you never hit infrastructure scaling costs.
Production Use Cases and When to Choose Managed Agents
Not every agent workload belongs on Managed Agents. Here is where it excels and where the alternatives win.
Strong Use Cases for Managed Agents
- Customer support agents on GCP: If your backend runs on Google Cloud, Managed Agents give you native access to BigQuery, Cloud SQL, and Firestore as data sources. No API gateway layer, no custom auth flow. The agent queries your data directly through its service account.
- Multimodal agents: Gemini handles images, audio, video, and text in a single model call. An insurance claims agent can accept photos of vehicle damage, analyze them, cross-reference policy terms, and generate a claim estimate without switching models or chaining separate services.
- Internal knowledge assistants: Deploy an agent that searches your company's documents in Vertex AI Search, answers questions with citations, and escalates to human support when confidence is low. The built-in grounding and citation features handle the hard parts.
- Rapid prototyping with production path: Start with a simple agent in the Cloud Console, iterate on the system prompt and tools, then expose the same endpoint to production traffic. No re-architecture required.
When to Choose Claude Agent SDK Instead
- Complex multi-step reasoning: Claude Opus and Sonnet still outperform Gemini on tasks requiring 8+ sequential reasoning steps with conditional branching. If your agent needs to analyze a legal contract, extract 20 specific clauses, cross-reference them, and generate a risk assessment, Claude handles this more reliably.
- Code generation agents: Claude leads on code benchmarks (SWE-Bench, HumanEval). For agents that write, review, or modify code, Claude Agent SDK produces better results.
- MCP ecosystem: If you need standardized connections to dozens of external services through the Model Context Protocol, Claude's native MCP support has a deeper ecosystem than Google's equivalent.
When to Choose OpenAI Agents SDK Instead
- Multi-agent handoff patterns: OpenAI's handoff abstraction is cleaner than Google's multi-agent mode for workflows where a triage agent routes to specialists. If your use case fits the "router plus specialists" pattern, OpenAI's SDK gets you there faster.
- Largest community: More tutorials, more examples, more Stack Overflow answers. For small teams without deep AI expertise, the community support advantage is real.
Production Best Practices for Managed Agents
We have deployed managed agents for clients across fintech, healthcare, and e-commerce. These are the lessons that saved them from production incidents.
Design Tool Descriptions Like API Documentation
The single biggest factor in agent reliability is tool description quality. Write descriptions that specify exactly what the tool does, what inputs it expects, what it returns, and when it should (and should not) be used. Include example values in parameter descriptions. Bad: "amount: number". Good: "amount: The transfer amount in USD as a decimal, e.g., 150.00. Must be between 0.01 and 50000.00." This level of detail cuts tool call errors by half.
Set Up Evaluation Before You Scale
Build a test suite of 50 to 100 representative conversations before you expose the agent to real users. Each test case should include the user messages, expected tool calls, and expected responses. Run these tests on every configuration change. Vertex AI's evaluation tools integrate with Managed Agents, letting you track accuracy, latency, and cost across test runs. Teams that skip evaluation end up debugging production incidents instead of shipping features.
Use Flash for Routing, Pro for Reasoning
A cost-effective pattern: use Gemini 2.5 Flash as your root agent to handle routing and simple queries, then delegate complex reasoning tasks to a sub-agent running Gemini 2.5 Pro. Flash handles 70 to 80 percent of requests at one-eighth the cost. Only the hard problems hit the expensive model. This hybrid approach keeps your average cost per conversation under $0.02 while maintaining quality on complex tasks.
Handle Tool Failures Gracefully
Tools fail. APIs time out. Databases return unexpected schemas. Your agent needs fallback behavior for every tool. Configure retry policies at the tool level (not just the agent level), set reasonable timeouts (5 seconds for database lookups, 15 seconds for external APIs), and write system instructions that tell the agent what to do when a tool fails: "If the order lookup tool returns an error, apologize and ask the customer to provide their order number again. Do not guess order details."
Monitor Token Usage Per Session
Long conversations accumulate tokens quickly. A 20-turn conversation with multiple tool calls can hit 50,000 to 100,000 tokens. Set session-level token budgets and configure the agent to summarize and reset context when approaching the limit. Without this, a small percentage of power users will generate disproportionate costs. We have seen single sessions cost $5 or more when left unchecked.
Building agents is the easy part. Running them reliably at scale, keeping costs predictable, and maintaining quality as your user base grows is where most teams struggle. If you want a team that has done this before to help you ship your first managed agent, or migrate an existing agent system to Google's managed infrastructure, book a free strategy call and we will map out the fastest path to production.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.