Claude API Tutorial: Build Your First AI Agent in 30 Minutes (2026)

The Claude API docs are good. But they stop short of showing you what actually matters if you’re building an agent for production — not just a toy that answers questions, but something that takes actions, uses tools, handles errors, and runs without you watching it.

This is that tutorial. By the end you’ll have a working agent that calls tools, handles streaming, and costs you less than a dollar a day to run.

Prerequisites

An Anthropic Console account with a funded API key
Node.js 18+ (the examples use the official @anthropic-ai/sdk)
~30 minutes

npm install @anthropic-ai/sdk

Set your key:

export ANTHROPIC_API_KEY="sk-ant-..."

The Core API: Messages

Everything goes through POST /v1/messages. Here’s the minimal working call:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "What is 2 + 2?" }
  ]
});

console.log(response.content[0].text); // "4"

That’s it. The API is clean. But this isn’t an agent — it’s a chatbot. Let’s fix that.

Tool Use: This Is Where Agents Start

Tool use (also called function calling) is what turns Claude from a text generator into something that can take action in the world. You define tools as JSON schemas, Claude decides when to call them, and your code executes them.

Here’s a real example — a tool that fetches the current price of a stock:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools = [
  {
    name: "get_stock_price",
    description: "Get the current price of a stock by ticker symbol",
    input_schema: {
      type: "object",
      properties: {
        ticker: {
          type: "string",
          description: "The stock ticker symbol, e.g. AAPL"
        }
      },
      required: ["ticker"]
    }
  }
];

// Simulate a real price lookup
function getStockPrice(ticker) {
  const prices = { AAPL: 189.42, MSFT: 415.10, NVDA: 875.33 };
  return prices[ticker] ?? null;
}

async function runAgent(userMessage) {
  const messages = [{ role: "user", content: userMessage }];

  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 1024,
      tools,
      messages
    });

    // If Claude wants to use a tool
    if (response.stop_reason === "tool_use") {
      const toolUse = response.content.find(b => b.type === "tool_use");
      const result = getStockPrice(toolUse.input.ticker);

      // Add Claude's response and tool result to the conversation
      messages.push({ role: "assistant", content: response.content });
      messages.push({
        role: "user",
        content: [{
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: JSON.stringify({ price: result, currency: "USD" })
        }]
      });

      // Loop — Claude will continue after seeing the tool result
      continue;
    }

    // Claude finished — return the final text
    return response.content.find(b => b.type === "text")?.text;
  }
}

const answer = await runAgent("What's the current price of Apple stock?");
console.log(answer);
// "Apple (AAPL) is currently trading at $189.42."

The key insight: tool use is a loop. Claude says “call this tool with these inputs,” you call it, send back the result, and Claude continues. The while (true) with a stop_reason check is the pattern you’ll use in every agent you build.

Streaming: Don’t Make Users Stare at a Spinner

For anything user-facing, streaming is non-negotiable. The first token appears in under a second instead of waiting 5-10s for the full response.

const stream = client.messages.stream({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain vector embeddings briefly." }]
});

stream.on("text", (text) => {
  process.stdout.write(text); // tokens appear as they arrive
});

const final = await stream.finalMessage();
console.log("\nTotal tokens used:", final.usage.input_tokens + final.usage.output_tokens);

For a Next.js or Express API endpoint, replace process.stdout.write with streaming your HTTP response. The SDK handles all the SSE parsing for you.

System Prompts: Give Your Agent a Brain

A system prompt is where you define your agent’s role, constraints, and behavior. This is the most important prompt you’ll write — it runs on every turn.

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 2048,
  system: `You are a financial research assistant that helps developers 
understand market data. You have access to real-time stock prices via tools. 
Always include the currency and note that prices may be delayed. 
Never give investment advice — only facts.`,
  messages: [{ role: "user", content: "How is Nvidia doing today?" }],
  tools
});

Keep system prompts under 1000 tokens if possible. Every token in the system prompt is re-billed on every call.

Model Selection: Which Claude to Use

As of March 2026, the production lineup looks like this:

Model	Best for	Cost (per 1M tokens)
claude-opus-4	Complex reasoning, long context	$15 in / $75 out
claude-sonnet-4-5	Balanced — most agents live here	$3 in / $15 out
claude-haiku-3-5	High-volume, fast, cheap	$0.80 in / $4 out

For most agents: start with Sonnet. It’s the sweet spot. Drop to Haiku for classification tasks or anything that runs hundreds of times per day. Upgrade to Opus only for tasks where quality difference is measurable.

Check current pricing on the Anthropic pricing page.

Prompt Caching: Cut Costs by Up to 90%

If your system prompt is long (tool definitions, persona, context), prompt caching lets Anthropic cache it server-side and charge only 10% on repeat calls.

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: yourLongSystemPrompt,
      cache_control: { type: "ephemeral" } // cache this block
    }
  ],
  messages
});

In practice: if your agent makes 1000 calls/day with a 2000-token system prompt, caching drops that from ~$6/day to ~$0.60/day. On a Claude API affiliate link billing plan, that’s the difference between a profitable agent and an expensive one.

Error Handling and Rate Limits

Production agents need retry logic. The SDK doesn’t auto-retry by default.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  maxRetries: 3,        // auto-retry on 429/500
  timeout: 30_000       // 30s timeout
});

The three errors you’ll actually hit:

429 rate limit — you’re hitting token-per-minute limits. Solution: add a queue, spread load across time.
529 overloaded — Anthropic servers under load. Retries with backoff handle this.
400 bad request — usually a malformed tool result or invalid messages array. Log the full payload when this happens.

The Minimal Agent Skeleton

Here’s the full pattern in one place — what most production agents look like before you add your domain logic:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ maxRetries: 3 });

async function runAgent({ systemPrompt, userMessage, tools = [], executeTool }) {
  const messages = [{ role: "user", content: userMessage }];

  for (let turn = 0; turn < 10; turn++) { // max 10 tool calls per run
    const response = await client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 2048,
      system: systemPrompt,
      tools,
      messages
    });

    if (response.stop_reason === "end_turn") {
      return response.content.find(b => b.type === "text")?.text;
    }

    if (response.stop_reason === "tool_use") {
      messages.push({ role: "assistant", content: response.content });

      const toolResults = await Promise.all(
        response.content
          .filter(b => b.type === "tool_use")
          .map(async (toolUse) => ({
            type: "tool_result",
            tool_use_id: toolUse.id,
            content: JSON.stringify(await executeTool(toolUse.name, toolUse.input))
          }))
      );

      messages.push({ role: "user", content: toolResults });
      continue;
    }

    throw new Error(`Unexpected stop_reason: ${response.stop_reason}`);
  }

  throw new Error("Agent exceeded max turns");
}

Pass in executeTool as a function that dispatches to your real implementations. This skeleton works for content agents, data agents, customer support bots, and anything else you’re building.

What to Build Next

The Claude API is the easiest part. What you do with it determines whether it makes money.

The three agent business models that are generating real revenue right now — niche content automation, data processing pipelines, and white-label AI tools — are covered in detail in make money with AI agents. The infrastructure layer (keeping agents running at 3am, handling errors, billing customers) is in AI agents passive income.

For the agentic coding side specifically — using Claude as part of your development workflow rather than shipping it to customers — agentic coding in 2026 covers the full tool comparison.

The API key costs money. The agents it enables can make considerably more.