Building Production-Ready MCP Servers: From Architecture to Deployment

I've built MCP servers that handle thousands of requests daily. I've also seen them fail spectacularly in production because they were designed as demos, not systems.

The difference isn't the protocol.

Anthropic's Model Context Protocol (MCP) is an open standard for connecting AI assistants to the systems where data lives. The real challenge is what happens after you wire up your first tool.

Production-ready MCP servers need architecture. They need security. They need observability. This guide covers all three.

Understanding MCP's Role in Your Architecture

Before diving into production patterns, let's be clear about what MCP actually is.

Developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers. Code execution with MCP enables agents to use context more efficiently by loading tools on demand, filtering data before it reaches the model, and executing complex logic in a single step. This is the real power—not just exposing tools, but doing it efficiently at scale.

But here's what most teams miss: MCP is just the transport layer. Your architecture decisions happen above and below it.

If you're new to MCP concepts, I'd recommend reviewing MCP vs Traditional APIs: When to Choose Model Context Protocol for AI Integration to understand when MCP makes sense versus other integration approaches.

The Core Architecture Pattern

I've found one pattern that consistently works for production MCP servers: focused services with clear boundaries.

Each MCP server should do one thing well. A database server handles queries. A file server handles storage. An API server handles external integrations. A notification server handles alerts. Don't try to build a mega-server that does everything.

// ❌ Anti-pattern: Monolithic server
class MegaMCPServer {
  async handleTool(name: string, args: any) {
    if (name === "query_db") { /* database logic */ }
    else if (name === "read_file") { /* file logic */ }
    else if (name === "call_api") { /* API logic */ }
    else if (name === "send_email") { /* email logic */ }
  }
}

// ✅ Pattern: Focused server
class DatabaseMCPServer {
  async handleTool(name: string, args: any) {
    if (name === "query") { /* database query */ }
    else if (name === "insert") { /* database insert */ }
    else if (name === "update") { /* database update */ }
  }
}

This matters for deployment, scaling, and security. Each server has one reason to change. Each server has one set of permissions it needs. Each server can be updated independently.

Security: The Non-Negotiable Layer

Security researchers have identified multiple outstanding security issues with MCP, including prompt injection, tool permissions where combining tools can exfiltrate files, and lookalike tools can silently replace trusted ones.

I'm not sharing this to scare you. I'm sharing it because the solution is straightforward: apply fundamental security controls consistently.

Authentication and Authorization

MCP servers MUST NOT accept any tokens that were not explicitly issued for the MCP server.

This means OAuth 2.0 or equivalent. Not optional. Not "we'll add it later."

import { Router } from "express";
import jwt from "jsonwebtoken";

const router = Router();

// Middleware: Verify JWT token
router.use((req, res, next) => {
  const token = req.headers.authorization?.split(" ")[1];
  
  if (!token) {
    return res.status(401).json({ error: "Missing token" });
  }
  
  try {
    const decoded = jwt.verify(token, process.env.JWT_SECRET);
    req.user = decoded;
    next();
  } catch (err) {
    res.status(401).json({ error: "Invalid token" });
  }
});

// Tool handler with user context
router.post("/tool", (req, res) => {
  const { toolName, args } = req.body;
  const userId = req.user.id;
  
  // Check if user has permission for this tool
  if (!hasPermission(userId, toolName)) {
    return res.status(403).json({ error: "Forbidden" });
  }
  
  // Execute tool with user context
  executeTool(toolName, args, userId);
});

Input Validation and Sanitization

Attackers can exploit unescaped input or poorly filtered parameters to inject code, particularly in setups where tools accept direct user input through prompts and forward it into shells, interpreters, or system commands. Without strict input sanitization, significant security risks arise from even a single endpoint.

import { z } from "zod";

// Define strict input schemas
const QuerySchema = z.object({
  table: z.string().regex(/^[a-zA-Z_][a-zA-Z0-9_]*$/), // Alphanumeric only
  limit: z.number().int().min(1).max(1000),
  offset: z.number().int().min(0),
  filters: z.record(z.string(), z.any()).optional(),
});

// Validate before execution
async function handleQuery(input: unknown) {
  const parsed = QuerySchema.parse(input); // Throws if invalid
  
  // Now we know input is safe
  const query = buildQuery(parsed);
  return executeQuery(query);
}

Least Privilege Access

MCP servers may have been granted excessive permissions to the service/resource they are accessing. For example, an MCP server that is part of an AI sales application connecting to an enterprise data store should have access scoped to the sales data and not allowed to access all the files in the store. No resource should have permissions in excess of what is required for it to execute the tasks it was intended for.

This applies to database credentials, API keys, file system access—everything.

// ❌ Bad: Server has admin access to everything
const db = await connect({
  user: "admin",
  password: process.env.DB_ADMIN_PASSWORD,
  database: "production",
});

// ✅ Good: Server has scoped access
const db = await connect({
  user: "mcp_sales_readonly",
  password: process.env.DB_MCP_PASSWORD,
  database: "production",
});

// Only sales data is accessible
// Only SELECT queries are allowed
// No DROP, DELETE, or ALTER permissions

Deployment Patterns

I've deployed MCP servers to Kubernetes, Vercel, and Cloudflare. Each has tradeoffs.

Kubernetes Deployment

For teams that need scale and control:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-database-server
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    spec:
      containers:
      - name: mcp-server
        image: my-mcp-server:v1.2.3
        ports:
        - containerPort: 8080
        
        # Resource limits prevent runaway processes
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        
        # Health checks catch failures early
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        
        # Secrets from secure storage
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: database-url
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: jwt-secret

Serverless Deployment

For lower-traffic servers or development:

// Vercel Edge Function
import { json } from "@vercel/edge";

export default async function handler(request: Request) {
  if (request.method !== "POST") {
    return json({ error: "Method not allowed" }, { status: 405 });
  }
  
  const token = request.headers.get("authorization")?.split(" ")[1];
  if (!token || !verifyToken(token)) {
    return json({ error: "Unauthorized" }, { status: 401 });
  }
  
  const { toolName, args } = await request.json();
  
  try {
    const result = await executeTool(toolName, args);
    return json({ result });
  } catch (error) {
    return json({ error: error.message }, { status: 500 });
  }
}

Configuration and Environment Management

Production servers need configuration that changes between environments without code changes.

import { z } from "zod";

const ConfigSchema = z.object({
  // Server config
  port: z.number().default(8080),
  environment: z.enum(["development", "staging", "production"]),
  
  // Database config
  databaseUrl: z.string().url(),
  databasePoolSize: z.number().int().min(1).max(100).default(10),
  
  // Security config
  jwtSecret: z.string().min(32),
  tokenExpiry: z.number().int().positive().default(3600),
  rateLimitPerMinute: z.number().int().positive().default(1000),
  
  // Observability config
  logLevel: z.enum(["debug", "info", "warn", "error"]).default("info"),
  sentryDsn: z.string().url().optional(),
});

const config = ConfigSchema.parse({
  port: process.env.PORT,
  environment: process.env.NODE_ENV,
  databaseUrl: process.env.DATABASE_URL,
  databasePoolSize: parseInt(process.env.DATABASE_POOL_SIZE || "10"),
  jwtSecret: process.env.JWT_SECRET,
  tokenExpiry: parseInt(process.env.TOKEN_EXPIRY || "3600"),
  rateLimitPerMinute: parseInt(process.env.RATE_LIMIT || "1000"),
  logLevel: process.env.LOG_LEVEL || "info",
  sentryDsn: process.env.SENTRY_DSN,
});

Observability: The Silent Killer

You can't fix what you can't see.

Implementing logging and monitoring of an AI application (including the MCP client/servers) and sending those logs to a central SIEM for detection of anomalous activities is essential.

Structured Logging

import winston from "winston";

const logger = winston.createLogger({
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: "error.log", level: "error" }),
    new winston.transports.File({ filename: "combined.log" }),
  ],
});

// Every tool execution gets logged
logger.info("tool_executed", {
  toolName: "query_database",
  userId: "user-123",
  duration: 245, // ms
  status: "success",
  inputHash: hash(args), // Don't log sensitive data
  timestamp: new Date().toISOString(),
});

// Errors get detailed context
logger.error("tool_failed", {
  toolName: "query_database",
  userId: "user-123",
  error: error.message,
  stack: error.stack,
  timestamp: new Date().toISOString(),
});

Metrics and Monitoring

Track the metrics that matter:

Tool execution time (p50, p95, p99)
Error rate by tool
Authentication failures
Rate limit violations
Resource usage (CPU, memory, connections)

import prom from "prom-client";

const toolExecutionDuration = new prom.Histogram({
  name: "mcp_tool_execution_duration_ms",
  help: "Tool execution duration in milliseconds",
  labelNames: ["tool_name", "status"],
  buckets: [10, 50, 100, 500, 1000, 5000],
});

const authFailures = new prom.Counter({
  name: "mcp_auth_failures_total",
  help: "Total authentication failures",
  labelNames: ["reason"],
});

// In your tool handler
const start = Date.now();
try {
  const result = await executeTool(toolName, args);
  toolExecutionDuration.labels(toolName, "success").observe(Date.now() - start);
  return result;
} catch (error) {
  toolExecutionDuration.labels(toolName, "error").observe(Date.now() - start);
  throw error;
}

Testing and Validation

Production MCP servers need comprehensive testing.

import { describe, it, expect, beforeAll, afterAll } from "vitest";

describe("MCP Server", () => {
  let server: MCPServer;
  let client: MCPClient;
  
  beforeAll(async () => {
    server = new MCPServer(testConfig);
    client = new MCPClient(testConfig);
    await server.start();
    await client.connect();
  });
  
  afterAll(async () => {
    await client.disconnect();
    await server.stop();
  });
  
  describe("Authentication", () => {
    it("should reject requests without token", async () => {
      const result = await client.callTool("query", {}, { token: null });
      expect(result.error).toBeDefined();
    });
    
    it("should reject invalid tokens", async () => {
      const result = await client.callTool("query", {}, { token: "invalid" });
      expect(result.error).toBeDefined();
    });
  });
  
  describe("Authorization", () => {
    it("should enforce tool permissions", async () => {
      const userToken = generateToken({ userId: "user-123", role: "viewer" });
      const result = await client.callTool("delete_data", {}, { token: userToken });
      expect(result.error).toContain("Forbidden");
    });
  });
  
  describe("Input Validation", () => {
    it("should reject invalid input", async () => {
      const token = generateToken({ userId: "admin", role: "admin" });
      const result = await client.callTool("query", { limit: -1 }, { token });
      expect(result.error).toBeDefined();
    });
  });
});

Scaling Considerations

As your MCP servers handle more traffic, you'll need to think about:

Connection Pooling

Developers routinely build agents with access to hundreds or thousands of tools across dozens of MCP servers. This means your server might handle thousands of concurrent connections.

import { Pool } from "pg";

const pool = new Pool({
  max: 20, // Maximum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Reuse connections instead of creating new ones
const client = await pool.connect();
try {
  const result = await client.query("SELECT * FROM users");
  return result.rows;
} finally {
  client.release();
}

Rate Limiting

import RedisStore from "rate-limit-redis";
import redis from "redis";

const redisClient = redis.createClient();

const limiter = rateLimit({
  store: new RedisStore({
    client: redisClient,
    prefix: "mcp_rate_limit:",
  }),
  windowMs: 60 * 1000, // 1 minute
  max: 100, // 100 requests per minute
  keyGenerator: (req) => req.user.id, // Per-user limits
});

app.use(limiter);

Real-World Integration Example

Here's how these patterns come together in a real GitHub MCP server:

import express from "express";
import { MCPServer } from "@anthropic-ai/sdk/mcp";
import { Octokit } from "@octokit/rest";

const app = express();
const mcp = new MCPServer({ name: "github-server" });

// 1. Authentication middleware
app.use(verifyJWT);

// 2. Rate limiting
app.use(rateLimiter);

// 3. Tool: List repositories
mcp.tool("list_repos", 
  {
    description: "List user's repositories",
    inputSchema: {
      type: "object",
      properties: {
        limit: { type: "number", maximum: 100 },
      },
    },
  },
  async (input, { userId, permissions }) => {
    // 4. Authorization check
    if (!permissions.includes("read:repos")) {
      throw new Error("Insufficient permissions");
    }
    
    // 5. Input validation already done by schema
    const octokit = new Octokit({ auth: getToken(userId) });
    
    // 6. Instrumentation
    logger.info("list_repos", { userId, limit: input.limit });
    
    const { data } = await octokit.repos.listForAuthenticatedUser({
      per_page: input.limit,
    });
    
    return { repos: data };
  }
);

app.listen(8080);

Monitoring and Incident Response

Production isn't just about deployment—it's about what happens after.

Set up alerts for:

Error rate exceeding 1%
p99 latency exceeding 1 second
Authentication failures exceeding 10 per minute
Resource usage (CPU over 80%, memory over 85%)

When incidents happen, you need a playbook:

Detect: Monitoring catches the issue
Alert: Team gets notified immediately
Investigate: Logs and metrics tell the story
Mitigate: Kill the deployment, rollback, or scale
Resolve: Fix the root cause
Learn: Post-mortem and prevent recurrence

The Path Forward

Building production-ready MCP servers isn't complicated. It's just a series of deliberate choices:

Choose focused services over monoliths
Choose authentication and authorization over convenience
Choose structured logging over hope
Choose testing over surprises

If you're building multi-agent systems, you'll also want to understand how MCP fits into the larger orchestration picture. Check out Building Production-Ready AI Agent Swarms: From Architecture to Deployment to see how MCP servers integrate into agent workflows.

For a deeper dive into MCP's role in your AI architecture, Anthropic's MCP Protocol: The Game-Changer Making Claude AI Agents Actually Useful covers the protocol from first principles.

And if you're deciding between MCP and traditional API approaches, Building Production-Ready AI Agents with Claude: From Prototype to Enterprise Deployment walks through the integration decisions you'll face.

The gap between a demo and production isn't about the tools. It's about discipline. Apply these patterns consistently, and you'll build MCP servers that scale, stay secure, and actually work when it matters.

Ready to build? Get in touch if you're working on production MCP deployments and want to discuss architecture patterns that work at scale.