Claude API & Developer Guide

📖 Fundamentals

What Is the Claude API?

The API lets you send messages to Claude programmatically and receive responses — enabling you to build Claude into any product or workflow.

🔌 REST API

The Claude API is a standard REST API. You send HTTP POST requests with your message and receive JSON responses. It works with any language that can make HTTP requests.

🔑 API Keys

Every request needs an API key for authentication. Keys are created in the Anthropic console and should be kept secret — never hardcode them in client-side code.

💬 Messages Format

Requests use a "messages" array where each message has a role ("user" or "assistant") and content. This lets you send full conversation histories for multi-turn chats.

📦 Official SDKs

Anthropic provides official SDKs for Python and TypeScript/JavaScript. These handle authentication, retries, and streaming for you — always use them over raw HTTP.

⚡ Streaming

Instead of waiting for the full response, streaming sends tokens as they're generated. This makes your app feel faster and is essential for good UX in chat applications.

🧠 System Prompts

A system prompt sets Claude's persona, rules, and context before the conversation starts. It's separate from the messages array and shapes how Claude behaves throughout.

🚀 Getting Started

Your First API Call

From zero to a working API call in under 5 minutes.

Get your API key

Install the SDK

Terminal

# Python
pip install anthropic

# Node.js
npm install @anthropic-ai/sdk

Make your first call

hello_claude.py

import anthropic client = anthropic.Anthropic( api_key="your-api-key-here" # or set ANTHROPIC_API_KEY env var ) message = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[ {"role": "user", "content": "Explain APIs in one paragraph."} ] ) print(message.content[0].text)

⚠️

Security tip: Never hardcode your API key in source code. Use environment variables (ANTHROPIC_API_KEY) or a secrets manager. Never commit keys to GitHub.

⚙️ Parameters

Understanding Every Parameter

What each field in the API request does and when to use it.

Parameter	Required?	Type	What it does	Recommended value
model	Required	string	Which Claude model to use	`claude-sonnet-4-5` for most tasks
messages	Required	array	The conversation history as an array of {role, content} objects	Always start with role: "user"
max_tokens	Required	integer	Maximum tokens in the response. Setting this too low will cut off responses mid-sentence.	1024 for chat, 4096 for long content
system	Optional	string	A system prompt that sets Claude's persona, rules, and context for the entire conversation	Always use for production apps
temperature	Optional	float 0–1	Controls randomness. 0 = deterministic, 1 = most creative. Default is 1.	0 for factual, 0.7 for creative
stream	Optional	boolean	If true, streams tokens as they generate instead of waiting for the full response	true for chat UIs
stop_sequences	Optional	array	Claude stops generating when it produces any of these strings. Useful for structured output.	e.g. ["", "Human:"]
top_p	Optional	float 0–1	Nucleus sampling threshold. Recommended to use temperature OR top_p, not both.	Leave unset if using temperature

🛠️ Practical Examples

Real-World Code Patterns

Copy-paste patterns for the most common use cases.

💬 Multi-Turn Conversation

multi_turn.py

# Build a conversation history and keep adding to it conversation_history = [] def chat(user_message): conversation_history.append({ "role": "user", "content": user_message }) response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system="You are a helpful assistant.", messages=conversation_history ) reply = response.content[0].text conversation_history.append({ "role": "assistant", "content": reply }) return reply # Use it print(chat("What's the capital of France?")) print(chat("What's the population of that city?")) # Claude remembers context

⚡ Streaming Response

streaming.py

# Tokens print as they generate — great for chat UIs with client.messages.stream( model="claude-sonnet-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Write a short poem about code."}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

🏗️ Structured JSON Output

structured_output.py

import json response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system="""You are a data extractor. Always respond with valid JSON only. No explanation, no markdown, just the JSON object.""", messages=[{ "role": "user", "content": """Extract the key info from this job posting as JSON with keys: title, company, salary_range, required_skills (array), location. Job posting: Senior Python Developer at TechCorp, Sydney. Salary $120-150k. Must know Python, FastAPI, PostgreSQL, Docker.""" }] ) data = json.loads(response.content[0].text) print(data['title']) # → "Senior Python Developer"

💰 Pricing

Understanding API Costs

You pay per token — both for input (your prompt) and output (Claude's response). Here's how to estimate your costs.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best for
Claude Haiku 4.5	Lowest	Lowest	High-volume, simple tasks
Claude Sonnet 4.5	Mid	Mid	Most production use cases
Claude Opus 4	Highest	Highest	Complex, high-value tasks

💡

Rule of thumb: ~1 token ≈ ¾ of a word. A typical user message is 50–200 tokens. A full blog post response is ~800–1500 tokens. Check exact pricing at anthropic.com/pricing as rates change.

🧮 Quick Cost Estimator

Model:

Requests per day:

Avg tokens per request: (input + output combined)

Estimated monthly cost: calculating...

✅ Best Practices

Production Best Practices

What separates a hobby project from a robust production integration.

🔁 Handle Retries & Rate Limits

The API can return 429 (rate limit) or 529 (overloaded) errors. Always implement exponential backoff. The official SDK handles this automatically — one more reason to use it.

🧹 Manage Context Window

For long conversations, old messages eat up your context window and cost money. Summarise old turns, prune irrelevant messages, or use a sliding window to keep only recent history.

📝 Log Everything

Store prompts, responses, token counts, and latency in a database. This is invaluable for debugging, cost tracking, and improving your prompts over time.

🛡️ Validate Outputs

If Claude returns structured data (JSON, code), always validate it before using it. Don't trust the shape of the output blindly — add error handling and fallbacks.

🌡️ Tune Temperature by Task

Use temperature=0 for factual Q&A, data extraction, and classification. Use 0.5–0.8 for creative writing and brainstorming. Never use temperature=1 for tasks that need precision.

💾 Cache Repeated Prompts

If many users ask similar questions (e.g. about your docs), cache the responses. Claude's prompt caching feature can significantly reduce costs for repeated context.

The Claude API Guide

What Is the Claude API?

🔌 REST API

🔑 API Keys

💬 Messages Format

📦 Official SDKs

⚡ Streaming

🧠 System Prompts

Your First API Call

Get your API key

Install the SDK

Make your first call

Understanding Every Parameter

Real-World Code Patterns

💬 Multi-Turn Conversation

⚡ Streaming Response

🏗️ Structured JSON Output

Understanding API Costs

🧮 Quick Cost Estimator

Production Best Practices

🔁 Handle Retries & Rate Limits

🧹 Manage Context Window

📝 Log Everything

🛡️ Validate Outputs

🌡️ Tune Temperature by Task

💾 Cache Repeated Prompts

Ready to start building?