“The gap between a working MCP prototype and a production-ready server isn't about the protocol—it's about architecture, security, and observability.”
I've built MCP servers that handle thousands of requests daily. I've also seen them fail spectacularly in production because they were designed as demos, not systems.
The difference isn't the protocol.
Anthropic's Model Context Protocol (MCP) is an open standard for connecting AI assistants to the systems where data lives. The real challenge is what happens after you wire up your first tool.
Production-ready MCP servers need architecture. They need security. They need observability. This guide covers all three.
Understanding MCP's Role in Your Architecture
Before diving into production patterns, let's be clear about what MCP actually is.
Developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers. Code execution with MCP enables agents to use context more efficiently by loading tools on demand, filtering data before it reaches the model, and executing complex logic in a single step. This is the real power—not just exposing tools, but doing it efficiently at scale.
But here's what most teams miss: MCP is just the transport layer. Your architecture decisions happen above and below it.
If you're new to MCP concepts, I'd recommend reviewing MCP vs Traditional APIs: When to Choose Model Context Protocol for AI Integration to understand when MCP makes sense versus other integration approaches.
The Core Architecture Pattern
I've found one pattern that consistently works for production MCP servers: focused services with clear boundaries.
Each MCP server should do one thing well. A database server handles queries. A file server handles storage. An API server handles external integrations. A notification server handles alerts. Don't try to build a mega-server that does everything.
// ❌ Anti-pattern: Monolithic server
class MegaMCPServer {
async handleTool(name: string, args: any) {
if (name === "query_db") { /* database logic */ }
else if (name === "read_file") { /* file logic */ }
else if (name === "call_api") { /* API logic */ }
else if (name === "send_email") { /* email logic */ }
}
}
// ✅ Pattern: Focused server
class DatabaseMCPServer {
async handleTool(name: string, args: any) {
if (name === "query") { /* database query */ }
else if (name === "insert") { /* database insert */ }
else if (name === "update") { /* database update */ }
}
}
This matters for deployment, scaling, and security. Each server has one reason to change. Each server has one set of permissions it needs. Each server can be updated independently.
Security: The Non-Negotiable Layer
Security researchers have identified multiple outstanding security issues with MCP, including prompt injection, tool permissions where combining tools can exfiltrate files, and lookalike tools can silently replace trusted ones.
I'm not sharing this to scare you. I'm sharing it because the solution is straightforward: apply fundamental security controls consistently.
Authentication and Authorization
MCP servers MUST NOT accept any tokens that were not explicitly issued for the MCP server.
This means OAuth 2.0 or equivalent. Not optional. Not "we'll add it later."
import { Router } from "express";
import jwt from "jsonwebtoken";
const router = Router();
// Middleware: Verify JWT token
router.use((req, res, next) => {
const token = req.headers.authorization?.split(" ")[1];
if (!token) {
return res.status(401).json({ error: "Missing token" });
}
try {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
req.user = decoded;
next();
} catch (err) {
res.status(401).json({ error: "Invalid token" });
}
});
// Tool handler with user context
router.post("/tool", (req, res) => {
const { toolName, args } = req.body;
const userId = req.user.id;
// Check if user has permission for this tool
if (!hasPermission(userId, toolName)) {
return res.status(403).json({ error: "Forbidden" });
}
// Execute tool with user context
executeTool(toolName, args, userId);
});
Input Validation and Sanitization
Attackers can exploit unescaped input or poorly filtered parameters to inject code, particularly in setups where tools accept direct user input through prompts and forward it into shells, interpreters, or system commands. Without strict input sanitization, significant security risks arise from even a single endpoint.
import { z } from "zod";
// Define strict input schemas
const QuerySchema = z.object({
table: z.string().regex(/^[a-zA-Z_][a-zA-Z0-9_]*$/), // Alphanumeric only
limit: z.number().int().min(1).max(1000),
offset: z.number().int().min(0),
filters: z.record(z.string(), z.any()).optional(),
});
// Validate before execution
async function handleQuery(input: unknown) {
const parsed = QuerySchema.parse(input); // Throws if invalid
// Now we know input is safe
const query = buildQuery(parsed);
return executeQuery(query);
}
Least Privilege Access
MCP servers may have been granted excessive permissions to the service/resource they are accessing. For example, an MCP server that is part of an AI sales application connecting to an enterprise data store should have access scoped to the sales data and not allowed to access all the files in the store. No resource should have permissions in excess of what is required for it to execute the tasks it was intended for.
This applies to database credentials, API keys, file system access—everything.
// ❌ Bad: Server has admin access to everything
const db = await connect({
user: "admin",
password: process.env.DB_ADMIN_PASSWORD,
database: "production",
});
// ✅ Good: Server has scoped access
const db = await connect({
user: "mcp_sales_readonly",
password: process.env.DB_MCP_PASSWORD,
database: "production",
});
// Only sales data is accessible
// Only SELECT queries are allowed
// No DROP, DELETE, or ALTER permissions
Deployment Patterns
I've deployed MCP servers to Kubernetes, Vercel, and Cloudflare. Each has tradeoffs.
Kubernetes Deployment
For teams that need scale and control:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-database-server
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
containers:
- name: mcp-server
image: my-mcp-server:v1.2.3
ports:
- containerPort: 8080
# Resource limits prevent runaway processes
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Health checks catch failures early
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
# Secrets from secure storage
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: mcp-secrets
key: database-url
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: mcp-secrets
key: jwt-secret
Serverless Deployment
For lower-traffic servers or development:
// Vercel Edge Function
import { json } from "@vercel/edge";
export default async function handler(request: Request) {
if (request.method !== "POST") {
return json({ error: "Method not allowed" }, { status: 405 });
}
const token = request.headers.get("authorization")?.split(" ")[1];
if (!token || !verifyToken(token)) {
return json({ error: "Unauthorized" }, { status: 401 });
}
const { toolName, args } = await request.json();
try {
const result = await executeTool(toolName, args);
return json({ result });
} catch (error) {
return json({ error: error.message }, { status: 500 });
}
}
Configuration and Environment Management
Production servers need configuration that changes between environments without code changes.
import { z } from "zod";
const ConfigSchema = z.object({
// Server config
port: z.number().default(8080),
environment: z.enum(["development", "staging", "production"]),
// Database config
databaseUrl: z.string().url(),
databasePoolSize: z.number().int().min(1).max(100).default(10),
// Security config
jwtSecret: z.string().min(32),
tokenExpiry: z.number().int().positive().default(3600),
rateLimitPerMinute: z.number().int().positive().default(1000),
// Observability config
logLevel: z.enum(["debug", "info", "warn", "error"]).default("info"),
sentryDsn: z.string().url().optional(),
});
const config = ConfigSchema.parse({
port: process.env.PORT,
environment: process.env.NODE_ENV,
databaseUrl: process.env.DATABASE_URL,
databasePoolSize: parseInt(process.env.DATABASE_POOL_SIZE || "10"),
jwtSecret: process.env.JWT_SECRET,
tokenExpiry: parseInt(process.env.TOKEN_EXPIRY || "3600"),
rateLimitPerMinute: parseInt(process.env.RATE_LIMIT || "1000"),
logLevel: process.env.LOG_LEVEL || "info",
sentryDsn: process.env.SENTRY_DSN,
});
Observability: The Silent Killer
You can't fix what you can't see.
Implementing logging and monitoring of an AI application (including the MCP client/servers) and sending those logs to a central SIEM for detection of anomalous activities is essential.
Structured Logging
import winston from "winston";
const logger = winston.createLogger({
format: winston.format.json(),
transports: [
new winston.transports.File({ filename: "error.log", level: "error" }),
new winston.transports.File({ filename: "combined.log" }),
],
});
// Every tool execution gets logged
logger.info("tool_executed", {
toolName: "query_database",
userId: "user-123",
duration: 245, // ms
status: "success",
inputHash: hash(args), // Don't log sensitive data
timestamp: new Date().toISOString(),
});
// Errors get detailed context
logger.error("tool_failed", {
toolName: "query_database",
userId: "user-123",
error: error.message,
stack: error.stack,
timestamp: new Date().toISOString(),
});
Metrics and Monitoring
Track the metrics that matter:
- Tool execution time (p50, p95, p99)
- Error rate by tool
- Authentication failures
- Rate limit violations
- Resource usage (CPU, memory, connections)
import prom from "prom-client";
const toolExecutionDuration = new prom.Histogram({
name: "mcp_tool_execution_duration_ms",
help: "Tool execution duration in milliseconds",
labelNames: ["tool_name", "status"],
buckets: [10, 50, 100, 500, 1000, 5000],
});
const authFailures = new prom.Counter({
name: "mcp_auth_failures_total",
help: "Total authentication failures",
labelNames: ["reason"],
});
// In your tool handler
const start = Date.now();
try {
const result = await executeTool(toolName, args);
toolExecutionDuration.labels(toolName, "success").observe(Date.now() - start);
return result;
} catch (error) {
toolExecutionDuration.labels(toolName, "error").observe(Date.now() - start);
throw error;
}
Testing and Validation
Production MCP servers need comprehensive testing.
import { describe, it, expect, beforeAll, afterAll } from "vitest";
describe("MCP Server", () => {
let server: MCPServer;
let client: MCPClient;
beforeAll(async () => {
server = new MCPServer(testConfig);
client = new MCPClient(testConfig);
await server.start();
await client.connect();
});
afterAll(async () => {
await client.disconnect();
await server.stop();
});
describe("Authentication", () => {
it("should reject requests without token", async () => {
const result = await client.callTool("query", {}, { token: null });
expect(result.error).toBeDefined();
});
it("should reject invalid tokens", async () => {
const result = await client.callTool("query", {}, { token: "invalid" });
expect(result.error).toBeDefined();
});
});
describe("Authorization", () => {
it("should enforce tool permissions", async () => {
const userToken = generateToken({ userId: "user-123", role: "viewer" });
const result = await client.callTool("delete_data", {}, { token: userToken });
expect(result.error).toContain("Forbidden");
});
});
describe("Input Validation", () => {
it("should reject invalid input", async () => {
const token = generateToken({ userId: "admin", role: "admin" });
const result = await client.callTool("query", { limit: -1 }, { token });
expect(result.error).toBeDefined();
});
});
});
Scaling Considerations
As your MCP servers handle more traffic, you'll need to think about:
Connection Pooling
Developers routinely build agents with access to hundreds or thousands of tools across dozens of MCP servers. This means your server might handle thousands of concurrent connections.
import { Pool } from "pg";
const pool = new Pool({
max: 20, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Reuse connections instead of creating new ones
const client = await pool.connect();
try {
const result = await client.query("SELECT * FROM users");
return result.rows;
} finally {
client.release();
}
Rate Limiting
import RedisStore from "rate-limit-redis";
import redis from "redis";
const redisClient = redis.createClient();
const limiter = rateLimit({
store: new RedisStore({
client: redisClient,
prefix: "mcp_rate_limit:",
}),
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute
keyGenerator: (req) => req.user.id, // Per-user limits
});
app.use(limiter);
Real-World Integration Example
Here's how these patterns come together in a real GitHub MCP server:
import express from "express";
import { MCPServer } from "@anthropic-ai/sdk/mcp";
import { Octokit } from "@octokit/rest";
const app = express();
const mcp = new MCPServer({ name: "github-server" });
// 1. Authentication middleware
app.use(verifyJWT);
// 2. Rate limiting
app.use(rateLimiter);
// 3. Tool: List repositories
mcp.tool("list_repos",
{
description: "List user's repositories",
inputSchema: {
type: "object",
properties: {
limit: { type: "number", maximum: 100 },
},
},
},
async (input, { userId, permissions }) => {
// 4. Authorization check
if (!permissions.includes("read:repos")) {
throw new Error("Insufficient permissions");
}
// 5. Input validation already done by schema
const octokit = new Octokit({ auth: getToken(userId) });
// 6. Instrumentation
logger.info("list_repos", { userId, limit: input.limit });
const { data } = await octokit.repos.listForAuthenticatedUser({
per_page: input.limit,
});
return { repos: data };
}
);
app.listen(8080);
Monitoring and Incident Response
Production isn't just about deployment—it's about what happens after.
Set up alerts for:
- Error rate exceeding 1%
- p99 latency exceeding 1 second
- Authentication failures exceeding 10 per minute
- Resource usage (CPU over 80%, memory over 85%)
When incidents happen, you need a playbook:
- Detect: Monitoring catches the issue
- Alert: Team gets notified immediately
- Investigate: Logs and metrics tell the story
- Mitigate: Kill the deployment, rollback, or scale
- Resolve: Fix the root cause
- Learn: Post-mortem and prevent recurrence
The Path Forward
Building production-ready MCP servers isn't complicated. It's just a series of deliberate choices:
- Choose focused services over monoliths
- Choose authentication and authorization over convenience
- Choose structured logging over hope
- Choose testing over surprises
If you're building multi-agent systems, you'll also want to understand how MCP fits into the larger orchestration picture. Check out Building Production-Ready AI Agent Swarms: From Architecture to Deployment to see how MCP servers integrate into agent workflows.
For a deeper dive into MCP's role in your AI architecture, Anthropic's MCP Protocol: The Game-Changer Making Claude AI Agents Actually Useful covers the protocol from first principles.
And if you're deciding between MCP and traditional API approaches, Building Production-Ready AI Agents with Claude: From Prototype to Enterprise Deployment walks through the integration decisions you'll face.
The gap between a demo and production isn't about the tools. It's about discipline. Apply these patterns consistently, and you'll build MCP servers that scale, stay secure, and actually work when it matters.
Ready to build? Get in touch if you're working on production MCP deployments and want to discuss architecture patterns that work at scale.
