Building Reliable AI Tools
After building dozens of AI agents, I've developed a set of patterns that consistently work. Here's the playbook.
The Foundation: Structured Output
The single most important pattern is forcing structured output from your LLM. Never rely on parsing free-form text.
const schema = z.object({
action: z.enum(["approve", "reject", "escalate"]),
confidence: z.number().min(0).max(1),
reasoning: z.string(),
nextSteps: z.array(z.string()).optional(),
});
const result = await model.generate({
prompt: buildPrompt(context),
schema: schema,
});
This simple pattern eliminates 80% of production issues. The model either returns valid structured data or fails explicitly.
Error Handling That Actually Works
Most AI code has optimistic error handling. Here's what production requires:
class AgentError(Exception):
"""Base class for agent errors with context preservation."""
def __init__(self, message: str, context: dict = None, recoverable: bool = False):
self.context = context or {}
self.recoverable = recoverable
super().__init__(message)
async def execute_with_retry(
func: Callable,
max_retries: int = 3,
backoff_base: float = 1.0
) -> Any:
for attempt in range(max_retries):
try:
return await func()
except AgentError as e:
if not e.recoverable or attempt == max_retries - 1:
raise
await asyncio.sleep(backoff_base * (2 ** attempt))
The goal isn't to prevent all errors. It's to fail gracefully and give yourself a path to recovery.
The Three-Layer Architecture
Every reliable agent follows this structure:
-
Orchestration Layer — Manages state, routing, and recovery
- Tracks conversation history
- Handles tool routing
- Implements retry logic
-
Tool Layer — Executes specific actions
- Each tool is a pure function
- Input/output schemas are explicit
- No side effects outside the tool's scope
-
Integration Layer — Connects to external systems
- Handles authentication
- Manages rate limits
- Transforms data formats
Here's why this matters:
- Testability — Each layer can be tested independently
- Debuggability — Clear boundaries make logs meaningful
- Maintainability — Changes are isolated to their layer
Tool Design Principles
Tools are the hands of your agent. Design them carefully:
interface Tool<TInput, TOutput> {
name: string;
description: string; // This is what the LLM sees
inputSchema: z.ZodType<TInput>;
outputSchema: z.ZodType<TOutput>;
execute: (input: TInput) => Promise<TOutput>;
}
const searchCustomers: Tool<SearchInput, Customer[]> = {
name: "search_customers",
description: "Search for customers by name, email, or phone number",
inputSchema: z.object({
query: z.string(),
limit: z.number().default(10),
}),
outputSchema: z.array(CustomerSchema),
execute: async (input) => {
// Implementation
},
};
Keep tools focused. A tool that does one thing well is more reliable than a tool that tries to do everything.
Monitoring and Observability
You can't improve what you can't measure. Essential metrics:
| Metric | What It Tells You |
|---|---|
| Token usage per request | Cost efficiency |
| Latency P50/P95/P99 | User experience |
| Tool call success rate | Integration health |
| Escalation rate | Agent capability gaps |
| User satisfaction | Overall effectiveness |
Log everything with correlation IDs. When something breaks at 3am, you'll thank yourself.
Common Pitfalls
After reviewing many failed AI projects, here are the patterns to avoid:
- Over-prompting — Long prompts don't mean better results. Be concise.
- Under-testing — If you haven't tested the edge case, assume it will fail.
- Ignoring latency — Users notice. Batch where possible.
- Monolithic agents — Break complex workflows into specialized sub-agents.
Getting Started
If you're building your first production agent:
- Start with a single, well-defined use case
- Build structured output first
- Add comprehensive logging before adding features
- Test with adversarial inputs early
- Plan for the integration layer to take 60% of your time
Need help building reliable AI tools? Let's talk.