Building Reliable AI Tools

After building dozens of AI agents, I've developed a set of patterns that consistently work. Here's the playbook.

The Foundation: Structured Output

The single most important pattern is forcing structured output from your LLM. Never rely on parsing free-form text.

const schema = z.object({
  action: z.enum(["approve", "reject", "escalate"]),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
  nextSteps: z.array(z.string()).optional(),
});

const result = await model.generate({
  prompt: buildPrompt(context),
  schema: schema,
});

This simple pattern eliminates 80% of production issues. The model either returns valid structured data or fails explicitly.

Error Handling That Actually Works

Most AI code has optimistic error handling. Here's what production requires:

class AgentError(Exception):
    """Base class for agent errors with context preservation."""

    def __init__(self, message: str, context: dict = None, recoverable: bool = False):
        self.context = context or {}
        self.recoverable = recoverable
        super().__init__(message)

async def execute_with_retry(
    func: Callable,
    max_retries: int = 3,
    backoff_base: float = 1.0
) -> Any:
    for attempt in range(max_retries):
        try:
            return await func()
        except AgentError as e:
            if not e.recoverable or attempt == max_retries - 1:
                raise
            await asyncio.sleep(backoff_base * (2 ** attempt))

The goal isn't to prevent all errors. It's to fail gracefully and give yourself a path to recovery.

The Three-Layer Architecture

Every reliable agent follows this structure:

Orchestration Layer — Manages state, routing, and recovery
- Tracks conversation history
- Handles tool routing
- Implements retry logic
Tool Layer — Executes specific actions
- Each tool is a pure function
- Input/output schemas are explicit
- No side effects outside the tool's scope
Integration Layer — Connects to external systems
- Handles authentication
- Manages rate limits
- Transforms data formats

Here's why this matters:

Testability — Each layer can be tested independently
Debuggability — Clear boundaries make logs meaningful
Maintainability — Changes are isolated to their layer

Tool Design Principles

Tools are the hands of your agent. Design them carefully:

interface Tool<TInput, TOutput> {
  name: string;
  description: string;  // This is what the LLM sees
  inputSchema: z.ZodType<TInput>;
  outputSchema: z.ZodType<TOutput>;
  execute: (input: TInput) => Promise<TOutput>;
}

const searchCustomers: Tool<SearchInput, Customer[]> = {
  name: "search_customers",
  description: "Search for customers by name, email, or phone number",
  inputSchema: z.object({
    query: z.string(),
    limit: z.number().default(10),
  }),
  outputSchema: z.array(CustomerSchema),
  execute: async (input) => {
    // Implementation
  },
};

Keep tools focused. A tool that does one thing well is more reliable than a tool that tries to do everything.

Monitoring and Observability

You can't improve what you can't measure. Essential metrics:

Metric	What It Tells You
Token usage per request	Cost efficiency
Latency P50/P95/P99	User experience
Tool call success rate	Integration health
Escalation rate	Agent capability gaps
User satisfaction	Overall effectiveness

Log everything with correlation IDs. When something breaks at 3am, you'll thank yourself.

Common Pitfalls

After reviewing many failed AI projects, here are the patterns to avoid:

Over-prompting — Long prompts don't mean better results. Be concise.
Under-testing — If you haven't tested the edge case, assume it will fail.
Ignoring latency — Users notice. Batch where possible.
Monolithic agents — Break complex workflows into specialized sub-agents.

Getting Started

If you're building your first production agent:

Start with a single, well-defined use case
Build structured output first
Add comprehensive logging before adding features
Test with adversarial inputs early
Plan for the integration layer to take 60% of your time

Need help building reliable AI tools? Let's talk.