Building AI Agents with Claude: Architecture and Patterns
TLDR
AI agents are the most significant shift in software architecture since microservices. Instead of writing deterministic code for every possible scenario, developers define tools and let an AI model orchestrate them to accomplish goals. This guide covers the architecture patterns, implementation details, and hard-won lessons from building production AI agents with Claude. We walk through the core agentic loop, tool design, memory systems, error recovery, and the patterns that separate toys from production systems. The AI agent market is projected to reach $47 billion by 2028 (Gartner), and developers who understand agent architecture now have a significant career advantage.
The Agentic Loop
Every AI agent follows the same fundamental pattern: observe the current state, decide what to do, execute an action, and evaluate the result. This loop repeats until the task is complete or a termination condition is met.
async function agentLoop(task: string, tools: Tool[], maxSteps = 20) {
const messages: Message[] = [{ role: "user", content: task }];
for (let step = 0; step < maxSteps; step++) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
tools: tools.map((t) => t.schema),
messages,
});
messages.push({ role: "assistant", content: response.content });
// Check if agent wants to use a tool
const toolUse = response.content.find((c) => c.type === "tool_use");
if (!toolUse) break; // Agent is done -- final text response
// Execute the tool
const result = await executeTool(toolUse.name, toolUse.input);
messages.push({
role: "user",
content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
});
}
return messages;
}
The critical design decision is the termination condition. Without proper limits, agents can loop indefinitely, burning tokens and taking wrong turns. Production agents need:
- Step limits — hard cap on iterations (10-50 depending on task complexity)
- Cost limits — track token usage and stop before exceeding budget
- Timeout limits — wall-clock time cap for user-facing agents
- Confidence signals — let the agent explicitly signal when it is done or stuck
Tool Design Principles
Tools are the agent’s hands. Poorly designed tools produce poor agent behavior, regardless of model quality.
// Good: specific, well-described, validated
const searchTool = {
name: "search_codebase",
description: "Search for files matching a pattern. Returns file paths and line numbers. Use for finding implementations, imports, or usages.",
input_schema: {
type: "object",
properties: {
query: { type: "string", description: "Regex pattern to search for" },
file_glob: { type: "string", description: "File pattern, e.g. '*.ts'" },
max_results: { type: "number", description: "Max results (default 20)" },
},
required: ["query"],
},
};
// Bad: vague, no constraints, ambiguous
const badTool = {
name: "do_stuff",
description: "Performs actions on the codebase",
input_schema: {
type: "object",
properties: {
action: { type: "string" },
},
},
};
Rules for tool design:
- One tool, one responsibility. Do not combine search and edit into one tool
- Descriptions should explain WHEN to use the tool, not just what it does
- Include parameter descriptions with examples
- Return structured results the agent can reason about
- Include error information in tool results, not exceptions
Memory and Context Management
Agents that handle complex tasks need memory beyond the conversation window. Claude’s context window is 200K tokens, but filling it degrades performance and increases cost.
Short-term memory: The conversation itself. Keep it lean by summarizing previous tool results instead of including raw data.
// Instead of keeping 10,000 lines of search results in context:
const summary = `Found 23 files matching "auth". Key files:
- src/lib/auth.ts (main auth module, 245 lines)
- src/middleware/jwt.ts (JWT validation, 89 lines)
- src/routes/login.ts (login endpoint, 156 lines)`;
Long-term memory: External storage. For agents that run across sessions or handle recurring tasks:
interface AgentMemory {
facts: Map<string, string>; // Learned facts about the codebase
decisions: Array<{ context: string; decision: string; outcome: string }>;
failedApproaches: string[]; // Avoid repeating mistakes
}
Retrieval-augmented context: Use embeddings to pull relevant information into the agent’s context on demand. pgvector or a dedicated vector store can index documentation, previous conversations, or codebase knowledge.
Error Recovery Patterns
Production agents must handle failures gracefully. The three most common failure modes:
Tool execution failure: The tool throws an error or returns unexpected data.
async function executeTool(name: string, input: unknown): Promise<string> {
try {
const tool = tools.find((t) => t.name === name);
if (!tool) return `Error: Unknown tool "${name}"`;
return await tool.execute(input);
} catch (error) {
return `Tool "${name}" failed: ${error.message}. Try a different approach.`;
}
}
Agent stuck in a loop: The agent repeats the same action. Track recent actions and inject a nudge:
if (isRepeatingAction(recentActions, 3)) {
messages.push({
role: "user",
content: "You seem to be repeating the same action. Please try a different approach or explain what is blocking you.",
});
}
Context window overflow: Summarize older conversation turns to keep the context window manageable. A common pattern is to keep the last 10 turns in full detail and summarize everything before that.
Production Architecture
A production agent system has several components beyond the core loop:
- Router: Classifies incoming tasks and selects the appropriate agent configuration
- Orchestrator: Manages the agentic loop, enforces limits, handles interrupts
- Tool registry: Discovers and validates available tools
- Audit log: Records every agent action for debugging and compliance
- Cost tracker: Monitors token usage per task, alerts on anomalies
For multi-agent systems, add a coordination layer. Anthropic’s Claude Agent SDK provides primitives for agent-to-agent handoffs, shared context, and parallel execution. The SDK processed over 100 million agent runs in its first quarter, indicating strong adoption.
The most important lesson from building production agents: start simple. A single agent with 3-5 well-designed tools outperforms a complex multi-agent system with poorly designed tools. Add complexity only when you have evidence that the simple approach is insufficient.
Frequently Asked Questions
What is the difference between an AI chatbot and an AI agent?
A chatbot responds to messages. An agent takes actions. An AI agent uses tools, makes decisions, executes multi-step plans, and can operate autonomously -- querying databases, calling APIs, modifying files, and verifying its own work. Agents have a loop: observe, plan, act, evaluate.
How much does it cost to run an AI agent in production?
Costs vary widely based on task complexity. A simple customer support agent processing 1,000 conversations per day costs roughly $50-150/month using Claude 3.5 Haiku. Complex coding agents that process large codebases can cost $5-20 per task using Sonnet. Prompt caching and model routing reduce costs by 40-60%.