Claude API流式输出实现教程（附完整代码）

分类：技术交流发布时间：2026年4月13日建议阅读时长：32 分钟

作者：sodope llm

一、什么是流式输出？

默认情况下，Claude API 会等模型生成完整回复后，一次性返回全部内容。这意味着用户需要等待数秒甚至更长时间，才能看到第一个字。

**流式输出（Streaming）**改变了这种行为：模型每生成一小段文字，就立刻推送给客户端，用户可以实时看到文字逐字出现，就像在 Claude 网页版看到的那种效果。

为什么要用流式输出？

用户体验更好：第一个 token 出现的延迟大幅降低（从等待全部到几乎立即开始）
感知速度更快：用户看到输出在进行，不会误以为系统卡住
可中断：可以让用户在生成中途停止
适合长文本：生成报告、代码等长内容时，用户不必盯着空白等待

二、流式输出 vs 普通输出选择指南

场景	推荐方式	原因
实时对话界面	流式	降低感知延迟，体验更自然
后台批量处理	普通	简化代码，不需要即时展示
生成长篇内容	流式	避免用户长时间等待
结构化数据提取	普通	需要完整 JSON，不适合流式解析
API 调用计费监控	普通	普通模式更容易获取 usage 信息

三、基础流式输出实现

3.1 使用 Anthropic SDK

			
import anthropic
client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://api.jiekou.ai"  # 国内中转，无需翻墙
)
# 方式一：使用 stream 上下文管理器（推荐）
with client.messages.stream(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "用 Python 写一个冒泡排序，并解释每一步的逻辑"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print()  # 最后换行

		

3.2 获取完整的流式事件

如果需要处理更多元数据（如 usage 信息），可以监听完整事件：

			
import anthropic
client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://api.jiekou.ai"
)
with client.messages.stream(
    model="claude-3-7-sonnet-20250219",
    max_tokens=1024,
    messages=[{"role": "user", "content": "解释一下量子纠缠"}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_delta':
                if hasattr(event.delta, 'text'):
                    print(event.delta.text, end="", flush=True)
            elif event.type == 'message_stop':
                print("\n\n[生成完成]")
    
    # 获取最终消息（包含 usage）
    final_message = stream.get_final_message()
    print(f"输入 tokens: {final_message.usage.input_tokens}")
    print(f"输出 tokens: {final_message.usage.output_tokens}")

		

3.3 使用 OpenAI SDK 兼容模式

			
from openai import OpenAI
client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.jiekou.ai/v1"
)
stream = client.chat.completions.create(
    model="claude-3-7-sonnet-20250219",
    messages=[{"role": "user", "content": "讲一个关于程序员的笑话"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

		

四、在 Web 应用中实现流式输出

4.1 FastAPI + SSE（Server-Sent Events）

			
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic
app = FastAPI()
client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://api.jiekou.ai"
)
@app.get("/chat/stream")
async def chat_stream(message: str):
    
    def generate():
        with client.messages.stream(
            model="claude-3-7-sonnet-20250219",
            max_tokens=1024,
            messages=[{"role": "user", "content": message}]
        ) as stream:
            for text in stream.text_stream:
                # SSE 格式：data: <内容>\n\n
                yield f"data: {text}\n\n"
        
        # 发送结束信号
        yield "data: [DONE]\n\n"
    
    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no"  # 禁用 Nginx 缓冲
        }
    )

		

前端接收：

			
const eventSource = new EventSource('/chat/stream?message=你好');
eventSource.onmessage = function(event) {
    if (event.data === '[DONE]') {
        eventSource.close();
        return;
    }
    // 将文字追加到 DOM
    document.getElementById('output').textContent += event.data;
};

		

4.2 Flask 实现流式接口

			
from flask import Flask, Response, request
import anthropic
app = Flask(__name__)
client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://api.jiekou.ai"
)
@app.route('/stream')
def stream():
    user_message = request.args.get('q', '你好')
    
    def generate():
        with client.messages.stream(
            model="claude-3-7-sonnet-20250219",
            max_tokens=1024,
            messages=[{"role": "user", "content": user_message}]
        ) as stream:
            for text in stream.text_stream:
                yield text
    
    return Response(generate(), mimetype='text/plain')

		

五、异步流式输出

对于异步场景（如 asyncio），Anthropic SDK 提供了异步版本：

			
import asyncio
import anthropic
async def main():
    client = anthropic.AsyncAnthropic(
        api_key="your-api-key",
        base_url="https://api.jiekou.ai"
    )
    
    async with client.messages.stream(
        model="claude-3-7-sonnet-20250219",
        max_tokens=1024,
        messages=[{"role": "user", "content": "异步流式输出示例"}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)
    
    print()
asyncio.run(main())

		

六、流式输出的注意事项

6.1 错误处理

流式传输过程中也可能出错，需要捕获：

			
try:
    with client.messages.stream(...) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
except anthropic.APIConnectionError:
    print("网络连接中断，请重试")
except anthropic.RateLimitError:
    print("请求频率超限，请稍后重试")
except anthropic.APIStatusError as e:
    print(f"API 错误：{e.status_code} - {e.message}")

		

6.2 超时设置

			
import anthropic
import httpx
client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://api.jiekou.ai",
    timeout=httpx.Timeout(
        connect=10.0,     # 连接超时
        read=120.0,       # 读取超时（流式需要更长）
        write=10.0,
        pool=10.0
    )
)

		

6.3 流式输出的计费

流式输出和普通输出的计费方式完全相同，都按实际生成的 token 数量计费。流式模式不会产生额外费用，但可能因为更快的响应让用户更倾向于让模型生成更长的内容。

七、完整的生产级示例

			
import anthropic
import httpx
import time
class ClaudeStreamingClient:
    """生产级 Claude 流式调用客户端"""
    
    def __init__(self, api_key: str):
        self.client = anthropic.Anthropic(
            api_key=api_key,
            base_url="https://api.jiekou.ai",
            timeout=httpx.Timeout(connect=10.0, read=120.0, write=10.0, pool=10.0)
        )
    
    def stream_response(self, prompt: str, system: str = None, 
                        model: str = "claude-3-7-sonnet-20250219",
                        max_tokens: int = 2048,
                        on_token=None,
                        on_complete=None):
        """
        流式调用 Claude
        
        Args:
            prompt: 用户输入
            system: 系统提示词
            model: 模型 ID
            max_tokens: 最大输出 tokens
            on_token: 每个 token 的回调函数 fn(text: str)
            on_complete: 完成后的回调函数 fn(full_text: str, usage: dict)
        """
        kwargs = {
            "model": model,
            "max_tokens": max_tokens,
            "messages": [{"role": "user", "content": prompt}]
        }
        if system:
            kwargs["system"] = system
        
        full_text = []
        start_time = time.time()
        
        try:
            with self.client.messages.stream(**kwargs) as stream:
                for text in stream.text_stream:
                    full_text.append(text)
                    if on_token:
                        on_token(text)
                
                final_message = stream.get_final_message()
                elapsed = time.time() - start_time
                
                if on_complete:
                    on_complete(
                        "".join(full_text),
                        {
                            "input_tokens": final_message.usage.input_tokens,
                            "output_tokens": final_message.usage.output_tokens,
                            "elapsed_seconds": round(elapsed, 2)
                        }
                    )
                
                return "".join(full_text)
                
        except Exception as e:
            raise RuntimeError(f"流式调用失败: {e}") from e
# 使用示例
client = ClaudeStreamingClient("your-api-key")
def on_token(text):
    print(text, end="", flush=True)
def on_complete(full_text, stats):
    print(f"\n\n--- 统计信息 ---")
    print(f"总字数: {len(full_text)}")
    print(f"输入 tokens: {stats['input_tokens']}")
    print(f"输出 tokens: {stats['output_tokens']}")
    print(f"耗时: {stats['elapsed_seconds']}秒")
client.stream_response(
    prompt="写一篇关于 Python 异步编程的技术文章，包含代码示例",
    system="你是一位技术写作专家，擅长写深入浅出的技术文章",
    on_token=on_token,
    on_complete=on_complete
)

		

结语

Claude API 流式输出的接入并不复杂，核心只是将 create() 替换为 stream() 并迭代 token 流。关键是根据你的应用场景选择合适的实现方式：终端应用用基础版本即可，Web 应用搭配 SSE/WebSocket，后台任务用异步版本。

国内开发者通过 jiekou.ai 中转平台接入，无需翻墙，延迟更低，流式输出效果更稳定。按量计费，流式模式不产生额外费用，是构建 AI 对话应用的理想选择。