OpenRouter Reasoning Tokens：统一接口控制 AI 推理深度与成本

OpenRouter 提供统一的 reasoning 参数，屏蔽了 OpenAI（effort 级别）、Anthropic（max_tokens 预算）、Google Gemini 3（thinkingLevel）等各 Provider 的差异化 API，让你用同一套代码控制任意支持推理的模型。推理 token 按输出 token 计费。支持 effort（xhigh/high/medium/low/minimal/none）、max_tokens（精确预算）、exclude（内部推理但不返回）、enabled（默认中等强度）四种控制方式。多轮对话中通过 reasoning_details 传回上一轮推理可保持推理连续性，对 tool calling 尤为重要。

Reasoning tokens（也称 thinking tokens）让模型在生成最终答案前先进行明确的思维链推理，通常能显著提升复杂问题的回答质量。OpenRouter 通过统一接口抹平了各 Provider 的实现差异。

控制 Reasoning Tokens

在请求体中使用 reasoning 参数：

{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    "effort": "high",      // "xhigh"/"high"/"medium"/"low"/"minimal"/"none"（OpenAI 风格）
    "max_tokens": 2000,    // 精确 token 预算（Anthropic 风格），与 effort 二选一
    "exclude": false,      // true = 模型内部推理，但不在响应中返回
    "enabled": true        // 仅启用，使用默认 medium 强度
  }
}

effort：推理强度等级

适用于 OpenAI 推理模型（o1、o3、GPT-5 系列）和 Grok 模型：

effort 值	推理 token 占比	说明
`xhigh`	~95% of max_tokens	最深入推理
`high`	~80% of max_tokens	高强度推理
`medium`	~50% of max_tokens	均衡（默认）
`low`	~20% of max_tokens	轻量推理
`minimal`	~10% of max_tokens	极简推理
`none`	0	完全关闭推理

max_tokens：精确 token 预算

适用于 Gemini 思考模型、Anthropic 推理模型、部分阿里 Qwen 思考模型（映射到 thinking_budget）：

{
  "reasoning": {
    "max_tokens": 2000
  }
}

对于仅支持 effort 的模型，max_tokens 会被用于推算对应 effort 级别；反之亦然。

exclude：隐藏推理过程

让模型内部完整推理，但响应中不返回推理内容（节省 token 传输带宽）：

{
  "reasoning": {
    "effort": "high",
    "exclude": true
  }
}

推理 token 仍然产生，仍然计费，只是不出现在响应里。

代码示例

基础使用（TypeScript）

import { OpenRouter } from '@openrouter/sdk';

const openRouter = new OpenRouter({
  apiKey: '<OPENROUTER_API_KEY>',
});

const response = await openRouter.chat.send({
  model: 'openai/o3-mini',
  messages: [
    {
      role: 'user',
      content: "How would you build the world's tallest skyscraper?",
    },
  ],
  reasoning: {
    effort: 'high',
  },
  stream: false,
});

console.log('REASONING:', response.choices[0].message.reasoning);
console.log('CONTENT:', response.choices[0].message.content);

精确 token 预算（Python，Anthropic 模型）

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="<OPENROUTER_API_KEY>",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[
        {"role": "user", "content": "What's the most efficient algorithm for sorting a large dataset?"}
    ],
    extra_body={
        "reasoning": {
            "max_tokens": 2000
        }
    },
)

msg = response.choices[0].message
print(getattr(msg, "reasoning", None))
print(getattr(msg, "content", None))

排除推理输出（Python）

response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    extra_body={
        "reasoning": {
            "effort": "high",
            "exclude": True
        }
    },
)

保留推理上下文（多轮对话）

在多轮对话中将上一轮的推理内容传回 API，可让模型从推理断点继续，而不是重新推理：

# 第一次请求（含 tools）
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[
        {"role": "user", "content": "What's the weather like in Boston? Then recommend what to wear."}
    ],
    tools=tools,
    extra_body={"reasoning": {"max_tokens": 2000}}
)

message = response.choices[0].message

# 传回时保留完整 reasoning_details
messages = [
    {"role": "user", "content": "What's the weather like in Boston? Then recommend what to wear."},
    {
        "role": "assistant",
        "content": message.content,
        "tool_calls": message.tool_calls,
        "reasoning_details": message.reasoning_details  # 不要修改，原样传回
    },
    {
        "role": "tool",
        "tool_call_id": message.tool_calls[0].id,
        "content": '{"temperature": 45, "condition": "rainy", "humidity": 85}'
    }
]

# 第二次请求，模型从上一轮推理继续
response2 = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=messages,
    tools=tools
)

两种传回方式：

message.reasoning（字符串）：仅传纯文本推理，适合只返回原始推理文本的模型
message.reasoning_details（数组）：传完整结构，适合加密推理或摘要推理的模型

重要：reasoning_details 数组中连续推理块的顺序必须与模型生成时完全一致，不可重排或修改。

保留推理在 tool calling 场景中尤为重要。当模型暂停生成等待 tool 结果时，原来的推理链需要完整保留，才能在收到 tool 结果后从断点继续推理，而不是重头开始。

reasoning_details 结构

非 streaming 响应：choices[].message.reasoning_details Streaming 响应：choices[].delta.reasoning_details（每个 chunk 片段式传输）

三种推理 detail 类型：

reasoning.text（原始文本推理）：

{
  "type": "reasoning.text",
  "text": "Let me think through this step by step:\n1. First...",
  "signature": "sha256:abc123...",
  "id": "reasoning-text-1",
  "format": "anthropic-claude-v1",
  "index": 0
}

reasoning.summary（推理过程摘要）：

{
  "type": "reasoning.summary",
  "summary": "Analyzed the problem by identifying key constraints...",
  "id": "reasoning-summary-1",
  "format": "anthropic-claude-v1",
  "index": 0
}

reasoning.encrypted（加密推理，部分 Provider 使用）：

{
  "type": "reasoning.encrypted",
  "data": "eyJlbmNyeXB0ZWQiOiJ0cnVlIn0=",
  "id": "reasoning-encrypted-1",
  "format": "anthropic-claude-v1",
  "index": 1
}

Provider 实现差异

Anthropic 模型

使用 reasoning.max_tokens 或 reasoning.effort 均可
不再支持 :thinking 变体后缀（已废弃），改用 reasoning 参数
max_tokens（整体 token 上限）必须严格大于 reasoning budget，以确保有足够 token 生成最终答案
推理 budget 上限：128,000 tokens；下限：1024 tokens
计算公式：budget_tokens = max(min(max_tokens * effort_ratio, 128000), 1024)

Google Gemini 3 模型

Gemini 3（如 google/gemini-3.1-pro-preview）使用 Google 的 thinkingLevel API，OpenRouter 将 reasoning.effort 自动映射：

OpenRouter `reasoning.effort`	Google `thinkingLevel`
`minimal`	`minimal`
`low`	`low`
`medium`	`medium`
`high`	`high`
`xhigh`	`high`（向下映射）

实际消耗的推理 token 数由 Google 内部决定，不会精确按百分比计算。若使用 reasoning.max_tokens，OpenRouter 会将其作为 thinkingBudget 传入，但 Gemini 3 内部仍会将其映射到 thinkingLevel，你拿不到精确的 token 控制。

向后兼容的旧参数

旧参数	等效新写法
`include_reasoning: true`	`reasoning: {}`
`include_reasoning: false`	`reasoning: { exclude: true }`

推荐迁移到新的 reasoning 参数，旧参数仅保留向后兼容，不保证长期支持。

常见问题

Q: Reasoning tokens 会额外收费吗？

A: 会。推理 token 按输出 token计费，与生成的最终答案 token 叠加计算。具体金额可在响应的 usage.completion_tokens_details.reasoning_tokens 字段中看到。使用 exclude: true 不能免除推理 token 的计费，只是不在响应中返回这些内容。

Q: 哪些模型支持返回 reasoning tokens？

A: 大多数推理模型支持，但 OpenAI o 系列（o1/o3）不返回推理内容，只能内部使用。支持返回的模型包括 Anthropic Claude 3.7+、DeepSeek R1、Gemini 思考系列、xAI Grok 等。具体可在 OpenRouter 模型页面查看各模型是否标注了 reasoning 能力。

Q: 多轮对话中每次都要传回 reasoning_details 吗？

A: 不是每次都必须，但在包含 tool calling 的对话中强烈建议传回。对于纯文本多轮对话，传回推理可以提高模型思路连贯性；对于 tool use 对话，不传回可能导致模型"忘记"中途的推理逻辑，产生错误或重复工作。

控制 Reasoning Tokens #

effort：推理强度等级 #

max_tokens：精确 token 预算 #

exclude：隐藏推理过程 #

代码示例 #

基础使用（TypeScript） #

精确 token 预算（Python，Anthropic 模型） #

排除推理输出（Python） #

保留推理上下文（多轮对话） #

reasoning_details 结构 #

Provider 实现差异 #

Anthropic 模型 #

Google Gemini 3 模型 #

向后兼容的旧参数 #

常见问题 #