Building Production-Ready AI Agents with Claude's Tool Use: A Complete Implementation Guide

I've built dozens of AI agents. The difference between ones that work in production and ones that fail in week two comes down to one thing: understanding the agentic loop and how to handle it properly.

Most developers get the happy path right. Claude calls a tool, you execute it, append the result, loop again. But production breaks on the edges—when Claude hits max_tokens mid-tool-call, when a tool times out, when you need to track every decision for compliance. That's where most agents die.

Here's what I've learned about building AI agents that actually scale.

The Agentic Loop: What's Actually Happening

At its core, an agentic loop is a cycle: Perceive → Reason → Act → Observe → Repeat. But the implementation details matter more than the conceptual pattern.

When you build an agent with Claude, the Claude API integrates tools directly into the user and assistant message structure, with messages containing arrays of text, image, tool_use, and tool_result blocks.

Here's what a single iteration looks like:

Send a message with tools defined - You provide the user's request and a list of available tools
Claude responds with a stop_reason - This tells you why Claude stopped generating
Check the stop_reason - This is the critical part most developers skip
Execute tools or return - Depending on the reason, either run the tool or return Claude's response
Append results and loop - If you executed a tool, add the result and ask Claude to continue

The key insight: stopReason "tool_use" means execute the tool and append result, loop again; stopReason "end_turn" means the agent is done, return the response.

Most bugs happen because developers check for text content instead of checking stopReason. Claude can return text blocks alongside tool_use blocks, so checking if response["output"]["message"]["content"][0]["type"] == "text" will break silently on complex tasks when a message like "I'll look that up now" is followed by a tool call.

Implementing the Core Loop

Here's the pattern that works:

async function runAgent(userMessage: string): Promise<string> {
  const messages: MessageParam[] = [
    { role: "user", content: userMessage }
  ];

  const maxIterations = 20; // Safety cap
  let iteration = 0;

  while (iteration < maxIterations) {
    iteration++;

    const response = await client.messages.create({
      model: "claude-opus-4-6",
      max_tokens: 4096,
      tools: getToolDefinitions(),
      messages: messages
    });

    // Step 1: Always append Claude's response first
    messages.push({
      role: "assistant",
      content: response.content
    });

    // Step 2: Check stopReason - this is the actual control signal
    if (response.stop_reason === "end_turn") {
      // Claude finished naturally
      const textContent = response.content.find(
        (block) => block.type === "text"
      );
      return textContent?.text || "No response generated";
    }

    if (response.stop_reason === "tool_use") {
      // Claude wants to use a tool - find and execute it
      const toolUseBlocks = response.content.filter(
        (block) => block.type === "tool_use"
      );

      const toolResults: ToolResultBlockParam[] = [];

      for (const toolUse of toolUseBlocks) {
        try {
          const result = await executeTool(
            toolUse.name,
            toolUse.input
          );

          toolResults.push({
            type: "tool_result",
            tool_use_id: toolUse.id,
            content: JSON.stringify(result)
          });
        } catch (error) {
          // Tool execution failed - report the error to Claude
          toolResults.push({
            type: "tool_result",
            tool_use_id: toolUse.id,
            content: `Error: ${error.message}`,
            is_error: true
          });
        }
      }

      // Append tool results and continue loop
      messages.push({
        role: "user",
        content: toolResults
      });
      continue;
    }

    if (response.stop_reason === "max_tokens") {
      // Claude ran out of tokens mid-response
      // This is a real problem - you need more context or a different approach
      throw new Error(
        "Agent hit max_tokens limit. Increase max_tokens or break task into smaller pieces."
      );
    }

    if (response.stop_reason === "pause_turn") {
      // Server tool hit iteration limit - continue the conversation
      messages.push({
        role: "assistant",
        content: response.content
      });
      continue;
    }

    // Unknown stop reason
    throw new Error(`Unexpected stop_reason: ${response.stop_reason}`);
  }

  throw new Error(
    `Agent exceeded maximum iterations (${maxIterations})`
  );
}

This pattern handles the core cases. But production needs more.

Tool Definition Matters More Than You Think

Consolidate related operations into fewer tools rather than creating a separate tool for every action, as fewer, more capable tools reduce selection ambiguity and make your tool surface easier for Claude to navigate.

Here's what separates good tool definitions from bad ones:

const tools: Tool[] = [
  {
    name: "database_query",
    description: "Execute read-only queries against the product database. Use this for lookups, searches, and data retrieval. Do NOT use for writes—use database_mutation instead.",
    input_schema: {
      type: "object" as const,
      properties: {
        query: {
          type: "string",
          description: "SQL SELECT query. Must be read-only. Examples: SELECT * FROM users WHERE id = ?; SELECT COUNT(*) FROM orders WHERE created_at > ?"
        },
        parameters: {
          type: "array",
          items: { type: "string" },
          description: "Query parameters for safe substitution. Always use parameters instead of string interpolation."
        }
      },
      required: ["query"]
    }
  }
];

The description is doing heavy lifting here. A good description clearly explains what the tool does, when to use it, what data it returns, and what parameters mean, while poor descriptions are too brief and leave Claude with many open questions about the tool's behavior and usage.

Error Handling: Where Agents Actually Break

When a tool throws an exception, the tool runner catches it and returns the error to Claude as a tool result with is_error: true, and by default only the exception message is included, not the full stack trace.

This is good, but you need to think about what information Claude actually needs to recover:

async function executeTool(
  name: string,
  input: Record<string, unknown>
): Promise<unknown> {
  try {
    switch (name) {
      case "search_documents":
        return await searchDocuments(input.query as string);

      case "fetch_url":
        const url = input.url as string;
        // Validate before executing
        if (!isValidUrl(url)) {
          throw new Error(
            `Invalid URL: ${url}. Must be https:// and from an approved domain.`
          );
        }
        return await fetch(url).then((r) => r.text());

      default:
        throw new Error(`Unknown tool: ${name}`);
    }
  } catch (error) {
    // Don't expose internal errors to Claude
    // Instead, give it actionable information
    const message =
      error instanceof Error ? error.message : "Unknown error";

    // Log for debugging
    console.error(`Tool ${name} failed:`, error);

    // Return structured error that Claude can understand
    throw new Error(
      `Tool execution failed: ${message}. ` +
      `Try a different approach or ask the user for clarification.`
    );
  }
}

The key: give Claude enough context to recover, not so much that it gets confused by stack traces.

Production Safeguards

Real agents need guardrails:

interface AgentConfig {
  maxIterations: number;
  maxTokensPerRequest: number;
  timeoutMs: number;
  costLimitCents: number; // Stop if cost exceeds this
  allowedTools: Set<string>;
}

async function runAgentWithGuards(
  userMessage: string,
  config: AgentConfig
): Promise<{ response: string; cost: number; iterations: number }> {
  let totalCost = 0;
  let iterations = 0;

  const startTime = Date.now();

  try {
    const result = await runAgent(userMessage);
    return {
      response: result,
      cost: totalCost,
      iterations: iterations
    };
  } catch (error) {
    if (Date.now() - startTime > config.timeoutMs) {
      return {
        response: "Agent timed out. Please try a simpler request.",
        cost: totalCost,
        iterations: iterations
      };
    }

    if (totalCost > config.costLimitCents) {
      return {
        response:
          "Request exceeded cost limit. Please try a more focused query.",
        cost: totalCost,
        iterations: iterations
      };
    }

    throw error;
  }
}

To ensure safety and efficiency, agents are equipped with safeguards like rate limits, iteration caps, timeouts, and spend limits which prevent runaway autonomy and uncontrolled resource consumption, and if the agent encounters a scenario that requires human intervention or exceeds its safety boundaries, escalation mechanisms such as human-in-the-loop triggers, Slack or PagerDuty alerts, and fallback logic are activated.

When to Use Tool Use vs. When Not To

Not every problem needs an agentic loop. The principle from both OpenAI and Anthropic's published guidance is consistent: start with the simplest architecture that solves the problem, and introduce the agent loop only when iterative reasoning and adaptive tool use are required.

Use the agentic loop when:

The task requires multiple steps with feedback between them
Claude needs to decide which tools to use based on intermediate results
The problem is exploratory (debugging, research, analysis)

Don't use it when:

You can solve it with a single API call
The sequence of steps is fixed and known upfront
You're just doing structured output extraction

Connecting to Your Broader Architecture

If you're building beyond a single agent, check out Building Production AI Agents: Lessons from the Trenches for patterns on scaling and orchestration. For more complex agent systems, Building Production-Ready AI Agent Swarms: From Architecture to Deployment covers multi-agent coordination.

And if you're using Claude Code for agent development, Claude Code Workflow Revealed: What Makes This AI Development Tool Revolutionary walks through how to leverage it effectively.

For standardized tool integration at scale, Anthropic's MCP Protocol: The Game-Changer Making Claude AI Agents Actually Useful covers the Model Context Protocol pattern that's becoming standard across the industry.

The Real Difference

Here's what separates production agents from demos:

stopReason handling - Check the actual control signal, not content type
Tool result formatting - Always append results properly, with error context
Error recovery - Give Claude useful information to recover, not stack traces
Observability - Log every decision for debugging and compliance
Safeguards - Cost limits, iteration caps, timeouts, human escalation

Most of the work in production agents isn't building the loop—it's handling what happens when the loop breaks.

Start with the pattern above. Test it with your tools. Add observability. Then add guards. That's the path from working demo to production agent.

For the latest Claude API documentation on tool use, check Anthropic's implementation guide.

Ready to build? The foundation is solid. The details are where it gets interesting.

Get in touch if you're scaling agents in production and hitting edge cases this guide doesn't cover.