Everything developers need to integrate Claude into their apps โ from your first API call to production-ready patterns. Includes Python and JavaScript examples throughout.
The API lets you send messages to Claude programmatically and receive responses โ enabling you to build Claude into any product or workflow.
The Claude API is a standard REST API. You send HTTP POST requests with your message and receive JSON responses. It works with any language that can make HTTP requests.
Every request needs an API key for authentication. Keys are created in the Anthropic console and should be kept secret โ never hardcode them in client-side code.
Requests use a "messages" array where each message has a role ("user" or "assistant") and content. This lets you send full conversation histories for multi-turn chats.
Anthropic provides official SDKs for Python and TypeScript/JavaScript. These handle authentication, retries, and streaming for you โ always use them over raw HTTP.
Instead of waiting for the full response, streaming sends tokens as they're generated. This makes your app feel faster and is essential for good UX in chat applications.
A system prompt sets Claude's persona, rules, and context before the conversation starts. It's separate from the messages array and shapes how Claude behaves throughout.
From zero to a working API call in under 5 minutes.
Sign up at console.anthropic.com โ Go to "API Keys" โ Create a new key. Copy it somewhere safe โ you won't see it again.
Security tip: Never hardcode your API key in source code. Use environment variables (ANTHROPIC_API_KEY) or a secrets manager. Never commit keys to GitHub.
What each field in the API request does and when to use it.
| Parameter | Required? | Type | What it does | Recommended value |
|---|---|---|---|---|
| model | Required | string | Which Claude model to use | claude-sonnet-4-5 for most tasks |
| messages | Required | array | The conversation history as an array of {role, content} objects | Always start with role: "user" |
| max_tokens | Required | integer | Maximum tokens in the response. Setting this too low will cut off responses mid-sentence. | 1024 for chat, 4096 for long content |
| system | Optional | string | A system prompt that sets Claude's persona, rules, and context for the entire conversation | Always use for production apps |
| temperature | Optional | float 0โ1 | Controls randomness. 0 = deterministic, 1 = most creative. Default is 1. | 0 for factual, 0.7 for creative |
| stream | Optional | boolean | If true, streams tokens as they generate instead of waiting for the full response | true for chat UIs |
| stop_sequences | Optional | array | Claude stops generating when it produces any of these strings. Useful for structured output. | e.g. ["", "Human:"] |
| top_p | Optional | float 0โ1 | Nucleus sampling threshold. Recommended to use temperature OR top_p, not both. | Leave unset if using temperature |
Copy-paste patterns for the most common use cases.
You pay per token โ both for input (your prompt) and output (Claude's response). Here's how to estimate your costs.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best for |
|---|---|---|---|
| Claude Haiku 4.5 | Lowest | Lowest | High-volume, simple tasks |
| Claude Sonnet 4.5 | Mid | Mid | Most production use cases |
| Claude Opus 4 | Highest | Highest | Complex, high-value tasks |
Rule of thumb: ~1 token โ ยพ of a word. A typical user message is 50โ200 tokens. A full blog post response is ~800โ1500 tokens. Check exact pricing at anthropic.com/pricing as rates change.
What separates a hobby project from a robust production integration.
The API can return 429 (rate limit) or 529 (overloaded) errors. Always implement exponential backoff. The official SDK handles this automatically โ one more reason to use it.
For long conversations, old messages eat up your context window and cost money. Summarise old turns, prune irrelevant messages, or use a sliding window to keep only recent history.
Store prompts, responses, token counts, and latency in a database. This is invaluable for debugging, cost tracking, and improving your prompts over time.
If Claude returns structured data (JSON, code), always validate it before using it. Don't trust the shape of the output blindly โ add error handling and fallbacks.
Use temperature=0 for factual Q&A, data extraction, and classification. Use 0.5โ0.8 for creative writing and brainstorming. Never use temperature=1 for tasks that need precision.
If many users ask similar questions (e.g. about your docs), cache the responses. Claude's prompt caching feature can significantly reduce costs for repeated context.