“MCP isn't just a protocol—it's the infrastructure layer that separates demo agents from production systems.”
Everyone wants AI agents. The problem? Most projects fail not because of the AI, but because the architecture falls apart at scale.
I've spent the last year building production AI agents with Claude and the Model Context Protocol (MCP). The difference between a demo that works once and a system that runs reliably every day comes down to three things: how you architect your integrations, how you handle security, and how you manage context at scale.
This guide walks through building production-grade AI agents with Claude's MCP—not the simplified version, but the patterns that actually work when you're handling real data and real consequences.
What MCP Actually Solves
Before you implement anything, understand what problem you're solving.
Before MCP, developers often had to build custom connectors for each data source or tool, resulting in what Anthropic described as an "N×M" data integration problem. Without MCP, you're writing custom code for every integration. Want your agent to talk to GitHub, Slack, and your database? That's three separate implementations, three separate authentication schemes, three separate error handlers.
The Model Context Protocol is an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. The architecture is straightforward: developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers.
The real power is standardization. Once you implement MCP once in your agent, you unlock an ecosystem of pre-built integrations. Claude now has a directory with over 75 connectors powered by MCP, and we recently launched Tool Search and Programmatic Tool Calling capabilities in the API to help optimize production-scale MCP deployments, handling thousands of tools efficiently.
Architecture: How to Structure Your Production Agent
Production agents are different from toys. You need to think about context management, tool discovery, and graceful degradation when integrations fail.
The Core Architecture Pattern
Here's the pattern I use for every production agent:
import Anthropic from "@anthropic-ai/sdk";
interface AgentConfig {
mcpServers: {
[key: string]: {
command: string;
args: string[];
env?: Record<string, string>;
};
};
allowedTools: string[];
contextLimit: number;
}
class ProductionAgent {
private client: Anthropic;
private config: AgentConfig;
private conversationHistory: Array<{
role: "user" | "assistant";
content: string;
}> = [];
constructor(config: AgentConfig) {
this.client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
this.config = config;
}
async executeTask(userMessage: string): Promise<string> {
// Add to conversation history
this.conversationHistory.push({
role: "user",
content: userMessage,
});
// Check context usage
const estimatedTokens = this.estimateTokens();
if (estimatedTokens > this.config.contextLimit * 0.8) {
// Implement context management strategy
this.pruneConversationHistory();
}
// Call Claude with MCP tools
const response = await this.client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 4096,
system: `You are a production AI agent. You have access to external tools via MCP.
When using tools:
1. Always validate inputs before calling tools
2. Handle errors gracefully
3. Provide clear reasoning for tool selection
4. Never assume tool success—always check results`,
messages: this.conversationHistory,
});
const assistantMessage =
response.content[0].type === "text" ? response.content[0].text : "";
this.conversationHistory.push({
role: "assistant",
content: assistantMessage,
});
return assistantMessage;
}
private estimateTokens(): number {
// Rough estimation: ~4 chars per token
const historyText = this.conversationHistory
.map((m) => m.content)
.join("");
return Math.ceil(historyText.length / 4);
}
private pruneConversationHistory(): void {
// Keep system context, remove oldest user/assistant pairs
if (this.conversationHistory.length > 10) {
this.conversationHistory = this.conversationHistory.slice(-10);
}
}
}
This pattern handles the fundamentals: conversation management, token tracking, and graceful context degradation.
Tool Discovery at Scale
When you have many MCP tools configured, tool definitions can consume a significant portion of your context window. MCP tool search solves this by dynamically loading tools on-demand instead of preloading all of them. It activates when your MCP tool descriptions would consume more than 10% of the context window. When triggered, MCP tools are marked with defer_loading: true rather than loaded into context upfront.
This is critical when you have hundreds of tools. Don't load everything upfront. Use tool search:
const agentConfig = {
mcpServers: {
github: {
command: "npx",
args: ["-y", "@modelcontextprotocol/server-github"],
env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
},
postgres: {
command: "npx",
args: ["-y", "@modelcontextprotocol/server-postgres"],
env: { DATABASE_URL: process.env.DATABASE_URL },
},
},
toolSearch: {
enabled: true,
threshold: 0.1, // Activate when tools exceed 10% of context
},
};
With tool search enabled, Claude will search for relevant tools when needed rather than loading all definitions upfront. This reduces context consumption and improves response latency.
Security: The Production Requirement
Security isn't optional in production. It's the difference between a working system and a liability.
Three Security Layers
Layer 1: Tool Permissions
Never give agents access to all tools. MCP tools require explicit permission before Claude can use them. Without permission, Claude will see that tools are available but won't be able to call them.
Use allowedTools to restrict access:
const allowedTools = [
"mcp__github__list_issues", // Only specific GitHub tools
"mcp__github__create_issue",
"mcp__slack__send_message", // Limited Slack access
// NOT mcp__github__delete_repo (too dangerous)
];
Layer 2: Credential Management
Don't scatter credentials across configuration files. As organizations add more MCP servers to extend Claude's capabilities, they often end up with credentials scattered across multiple configuration files and systems. This creates security vulnerabilities and makes compliance auditing extremely difficult.
Use environment variables and secrets management:
// ✓ Correct: Use environment variables
const mcpConfig = {
postgres: {
command: "npx",
args: ["@modelcontextprotocol/server-postgres"],
env: {
DATABASE_URL: process.env.DATABASE_URL, // From secure vault
DATABASE_PASSWORD: process.env.DATABASE_PASSWORD,
},
},
};
// ✗ Wrong: Hardcoded credentials
const badConfig = {
postgres: {
env: {
DATABASE_URL: "postgres://user:password123@localhost/db",
},
},
};
Layer 3: Tool Audit Logging
Every tool call should be logged and auditable. This is non-negotiable for production:
class AuditedAgent extends ProductionAgent {
private auditLog: Array<{
timestamp: string;
userId: string;
tool: string;
input: Record<string, unknown>;
result: string;
status: "success" | "error";
}> = [];
async executeToolCall(
toolName: string,
input: Record<string, unknown>
): Promise<string> {
const startTime = Date.now();
try {
const result = await this.callTool(toolName, input);
this.auditLog.push({
timestamp: new Date().toISOString(),
userId: this.getCurrentUserId(),
tool: toolName,
input,
result: JSON.stringify(result).substring(0, 500), // Truncate for storage
status: "success",
});
return result;
} catch (error) {
this.auditLog.push({
timestamp: new Date().toISOString(),
userId: this.getCurrentUserId(),
tool: toolName,
input,
result: error instanceof Error ? error.message : "Unknown error",
status: "error",
});
throw error;
}
}
async flushAuditLog(): Promise<void> {
// Send to audit system (Datadog, Splunk, etc.)
if (this.auditLog.length > 0) {
await this.sendToAuditSystem(this.auditLog);
this.auditLog = [];
}
}
private getCurrentUserId(): string {
// Implementation depends on your auth system
return process.env.USER_ID || "unknown";
}
private async sendToAuditSystem(
logs: typeof this.auditLog
): Promise<void> {
// Send to your audit backend
}
private async callTool(
toolName: string,
input: Record<string, unknown>
): Promise<string> {
// Implementation
return "";
}
}
Context Management: The Hard Problem
This is where most agents fail in production. You start with a 200K token context window. Then you add tool definitions, conversation history, system prompts, and retrieved context. Suddenly you're out of space.
The Three-Tier Strategy
Tier 1: Conversation Pruning
Keep recent conversation history, archive old exchanges:
class ContextManagedAgent {
private maxConversationTurns = 20;
pruneConversation(): void {
if (this.conversationHistory.length > this.maxConversationTurns) {
// Keep the first turn (for context) and last N turns
const first = this.conversationHistory[0];
const recent = this.conversationHistory.slice(
-this.maxConversationTurns
);
this.conversationHistory = [first, ...recent];
}
}
}
Tier 2: Tool Summarization
When tools return large outputs, summarize before adding to context:
async summarizeToolOutput(
toolName: string,
output: string
): Promise<string> {
if (output.length > 5000) {
// Use Claude to summarize
const summary = await this.client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 500,
messages: [
{
role: "user",
content: `Summarize this output in 100 words max:\n\n${output}`,
},
],
});
return summary.content[0].type === "text" ? summary.content[0].text : "";
}
return output;
}
Tier 3: Selective Tool Loading
Use tool code execution to call tools instead of direct tool calls, which scales better. This lets agents write code to interact with tools rather than loading all tool definitions upfront.
Deployment Patterns
Local Development
For development and testing:
# Start with local MCP servers
claude mcp add --transport stdio filesystem /Users/me/projects
claude mcp add --transport stdio postgres postgresql://localhost/dev
Production Cloud Deployment
For production, use HTTP transport with remote MCP servers:
const productionConfig = {
mcpServers: {
database: {
type: "http",
url: "https://mcp-postgres.company.com/mcp",
headers: {
Authorization: `Bearer ${process.env.MCP_API_TOKEN}`,
},
},
github: {
type: "http",
url: "https://mcp-github.company.com/mcp",
headers: {
Authorization: `Bearer ${process.env.GITHUB_MCP_TOKEN}`,
},
},
},
};
HTTP servers are the recommended option for connecting to remote MCP servers. This is the most widely supported transport for cloud-based services.
Real-World Example: Building a Code Review Agent
Let me show you a complete example—a production agent that reviews pull requests:
class CodeReviewAgent extends AuditedAgent {
async reviewPullRequest(repoOwner: string, repoName: string, prNumber: number): Promise<string> {
const userMessage = `Review pull request #${prNumber} in ${repoOwner}/${repoName}.
Check:
1. Code quality and style
2. Test coverage
3. Security concerns
4. Performance implications
Provide a structured review with specific suggestions.`;
// Execute with MCP tools available:
// - mcp__github__get_pull_request
// - mcp__github__get_pull_request_diff
// - mcp__github__list_pull_request_files
// - mcp__slack__send_message (to notify reviewers)
const response = await this.executeTask(userMessage);
// Log the review
await this.flushAuditLog();
return response;
}
}
// Usage
const agent = new CodeReviewAgent({
mcpServers: {
github: {
command: "npx",
args: ["-y", "@modelcontextprotocol/server-github"],
env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
},
slack: {
command: "npx",
args: ["-y", "@modelcontextprotocol/server-slack"],
env: { SLACK_BOT_TOKEN: process.env.SLACK_BOT_TOKEN },
},
},
allowedTools: [
"mcp__github__get_pull_request",
"mcp__github__get_pull_request_diff",
"mcp__github__list_pull_request_files",
"mcp__slack__send_message",
],
contextLimit: 150000,
});
// Run the review
agent.reviewPullRequest("anthropics", "anthropic-sdk-js", 42).then((review) => {
console.log("Review complete:", review);
});
Monitoring and Observability
Production systems fail silently. You need visibility.
Key Metrics to Track
- Tool Success Rate - What percentage of tool calls succeed?
- Context Usage - Are you hitting context limits?
- Latency - How long are requests taking?
- Error Patterns - What fails most often?
class ObservableAgent extends AuditedAgent {
private metrics = {
toolCalls: 0,
toolSuccesses: 0,
toolErrors: 0,
avgLatency: 0,
contextUsage: 0,
};
async executeTask(userMessage: string): Promise<string> {
const startTime = Date.now();
try {
const result = await super.executeTask(userMessage);
this.metrics.toolSuccesses++;
return result;
} catch (error) {
this.metrics.toolErrors++;
throw error;
} finally {
const latency = Date.now() - startTime;
this.metrics.avgLatency =
(this.metrics.avgLatency + latency) / 2;
this.metrics.toolCalls++;
// Send to monitoring system
await this.sendMetrics();
}
}
private async sendMetrics(): Promise<void> {
// Send to Datadog, New Relic, etc.
console.log("Metrics:", this.metrics);
}
}
Common Pitfalls and How to Avoid Them
Pitfall 1: Loading All Tools Upfront
Don't do this. Use tool search. It's built for this exact problem.
Pitfall 2: Trusting Tool Outputs Blindly
Always validate. A tool can return data that looks correct but isn't:
async executeTask(userMessage: string): Promise<string> {
const response = await this.callTool("get_user", { id: "123" });
// ✓ Validate the response
if (!response.id || !response.email) {
throw new Error("Invalid user response structure");
}
return response;
}
Pitfall 3: Ignoring Error Handling
Tools fail. Networks fail. Databases go down. Plan for it:
async callToolWithRetry(
toolName: string,
input: Record<string, unknown>,
maxRetries: number = 3
): Promise<string> {
for (let i = 0; i < maxRetries; i++) {
try {
return await this.callTool(toolName, input);
} catch (error) {
if (i === maxRetries - 1) {
throw new Error(
`Tool ${toolName} failed after ${maxRetries} retries: ${error}`
);
}
// Exponential backoff
await new Promise((resolve) =>
setTimeout(resolve, Math.pow(2, i) * 1000)
);
}
}
return "";
}
Next Steps
You now have the architecture for building production-ready AI agents with Claude and MCP. The implementation is straightforward, but the details matter.
Start with a single MCP server. Get security and monitoring right. Then scale up.
For more advanced patterns, check out Building Production-Ready AI Agent Swarms: From Architecture to Deployment to see how to orchestrate multiple agents. If you're comparing Claude to other models, Claude vs OpenAI GPT for Building AI Agents: A Developer's Complete Comparison covers the practical differences.
You can also explore Building AI Agents That Actually Work for foundational patterns that complement this MCP-focused guide.
The official MCP documentation has the complete specification and examples. Official SDKs for MCP in all major programming languages with 97M+ monthly SDK downloads across Python and TypeScript means you have solid tools to build with.
MCP isn't magic. It's plumbing. But great plumbing lets you build great systems.
Ready to ship production agents? Get in touch—I help teams architect and deploy AI systems that actually work.
