Claude API Tutorial: From Zero to Production
TLDR
The Claude API gives developers access to one of the most capable AI models available today. This tutorial walks through everything needed to go from a fresh API key to a production-ready integration. We cover authentication, basic completions, streaming responses, tool use (function calling), error handling, and cost optimization. By the end, you will have working code patterns that handle real-world edge cases, not just hello-world demos. The Claude API processed over 1 billion API calls in Q1 2026, making it one of the fastest-growing AI APIs in the developer ecosystem.
Getting Started: Authentication and Setup
Every Claude API integration starts with an API key from console.anthropic.com. After creating an account, generate a key and store it securely — never commit API keys to version control.
Install the official SDK:
npm install @anthropic-ai/sdk
Create your first completion:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain REST vs GraphQL in 3 sentences." },
],
});
console.log(message.content[0].text);
The SDK handles retries, rate limiting, and proper error types out of the box. For Python developers, pip install anthropic provides the same experience.
Streaming Responses
Production applications need streaming to deliver responsive UX. Users expect to see tokens appear in real time, not wait 5-10 seconds for a complete response.
const stream = client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a deployment checklist." }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
Streaming reduces perceived latency by 60-80% in user-facing applications. The first token typically arrives within 200-400ms, compared to 3-8 seconds for a full non-streamed response.
Tool Use (Function Calling)
Tool use is where the Claude API becomes genuinely powerful. Instead of just generating text, Claude can call functions you define, enabling it to query databases, call external APIs, or perform calculations.
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tools: [
{
name: "get_weather",
description: "Get current weather for a city",
input_schema: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
},
},
],
messages: [{ role: "user", content: "What's the weather in Berlin?" }],
});
Claude will respond with a tool_use content block containing the function name and parsed arguments. Your application executes the function, then sends the result back as a tool_result message. This loop enables complex multi-step workflows.
Error Handling and Rate Limits
The Anthropic API enforces rate limits based on your tier. Free tier allows 5 requests per minute; paid tiers scale to 1,000+ RPM. Proper error handling is non-negotiable for production:
import Anthropic from "@anthropic-ai/sdk";
async function callClaude(prompt: string, retries = 3): Promise<string> {
for (let i = 0; i < retries; i++) {
try {
const msg = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: prompt }],
});
return msg.content[0].text;
} catch (error) {
if (error instanceof Anthropic.RateLimitError) {
await new Promise((r) => setTimeout(r, Math.pow(2, i) * 1000));
continue;
}
throw error;
}
}
throw new Error("Max retries exceeded");
}
The SDK automatically retries on 429 and 500 errors with exponential backoff, but wrapping calls in your own retry logic gives finer control over timeout behavior and fallback strategies.
Cost Optimization in Production
Token costs add up fast at scale. Here are proven strategies to keep Claude API bills manageable:
Choose the right model. Claude 3.5 Haiku handles 80% of tasks at 12x lower cost than Sonnet. Route simple classification, extraction, and formatting tasks to Haiku; reserve Sonnet for complex reasoning.
Cache system prompts. Anthropic’s prompt caching reduces costs by up to 90% for repeated system prompts. A 2,000-token system prompt used across 10,000 requests saves roughly $57.
Minimize output tokens. Set max_tokens to the smallest reasonable value. Ask Claude to be concise. A 500-token response costs 50% less than a 1,000-token response.
// Route to cheaper model for simple tasks
const model = isComplexTask(input)
? "claude-sonnet-4-20250514"
: "claude-haiku-4-5-20251022";
Going to Production
Before deploying, ensure you have:
- API key rotation — store keys in environment variables or a secrets manager, never hardcode
- Request logging — log prompts and responses (with PII redaction) for debugging
- Fallback behavior — graceful degradation when the API is unavailable
- Usage monitoring — set up billing alerts in the Anthropic console
- Input validation — sanitize user input before sending to the API
The Claude API is production-ready and powers applications serving millions of users. With proper error handling, cost optimization, and monitoring, teams can build reliable AI features that scale. For developers building AI-powered products, the combination of Claude’s reasoning capabilities and well-designed SDK makes it one of the strongest options available in 2026.
Frequently Asked Questions
How much does the Claude API cost?
Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens. Claude 3.5 Haiku is cheaper at $0.25/$1.25 per million tokens. Most production apps spend $50-200/month depending on volume.
Can we use Claude API for commercial projects?
Yes. Anthropic's API is fully available for commercial use. You need an API key from console.anthropic.com, and usage is billed per token with no minimum commitment.