通过 diagnostics-otel 插件将 OpenClaw 网关的诊断数据（模型 token 用量、消息流程、会话状态、工具执行等）导出到任何支持 OTLP/HTTP 的 collector 或后端（Grafana、Datadog、Honeycomb、New Relic、Tempo）。默认不导出原始内容，可独立开启 traces、metrics、logs，支持采样率、自定义端点、信号特定端点及环境变量覆盖。安装后用 openclaw plugins enable diagnostics-otel 启用，配置在 diagnostics.otel 下。

OpenClaw 怎么配置 OpenTelemetry 导出（诊断指标、链路、日志）

OpenClaw 通过官方插件 diagnostics-otel 使用 OTLP/HTTP（protobuf） 导出诊断数据。任何接受 OTLP/HTTP 的 collector 或后端都无需改代码即可接入。本地日志文件读取见日志。

架构概览

诊断事件是由 Gateway 和内置插件发出的结构化进程内记录，涵盖模型运行、消息流、会话、队列和 exec。
diagnostics-otel 插件订阅这些事件，并作为 OpenTelemetry metrics、traces、logs 通过 OTLP/HTTP 导出。
当 provider 传输层接受自定义头时，OpenClaw 的模型调用 span 上下文会给 provider 调用附加 W3C traceparent 头。插件发出的 trace 上下文不会传播到外部。
导出器只在诊断面板和插件都启用时才挂载，因此默认进程内开销接近零。

快速开始

安装插件（针对打包安装）：

bash

openclaw plugins install clawhub:@openclaw/diagnostics-otel

配置示例（JSON5）：

json5

{
  plugins: {
    allow: ["diagnostics-otel"],
    entries: {
      "diagnostics-otel": { enabled: true },
    },
  },
  diagnostics: {
    enabled: true,
    otel: {
      enabled: true,
      endpoint: "http://otel-collector:4318",
      protocol: "http/protobuf",
      serviceName: "openclaw-gateway",
      traces: true,
      metrics: true,
      logs: true,
      sampleRate: 0.2,
      flushIntervalMs: 60000,
    },
  },
}

也可以从 CLI 启用：

bash

openclaw plugins enable diagnostics-otel

TIP

目前 protocol 只支持 http/protobuf，grpc 会被忽略。

导出的信号

信号	包含内容
Metrics	计数器和直方图：token 用量、费用、运行时长、消息流程、Talk 事件、队列 lane、会话状态/恢复、exec、内存压力。
Traces	Span：模型用量、模型调用、harness 生命周期、工具执行、exec、webhook/消息处理、上下文组装、工具循环。
Logs	当 `diagnostics.otel.logs` 启用时，`logging.file` 的结构化记录通过 OTLP 导出。

traces、metrics、logs 可独立开关。当 diagnostics.otel.enabled 为 true 时，三者默认都开启。

配置参考

json5

{
  diagnostics: {
    enabled: true,
    otel: {
      enabled: true,
      endpoint: "http://otel-collector:4318",
      tracesEndpoint: "http://otel-collector:4318/v1/traces",
      metricsEndpoint: "http://otel-collector:4318/v1/metrics",
      logsEndpoint: "http://otel-collector:4318/v1/logs",
      protocol: "http/protobuf", // grpc 会被忽略
      serviceName: "openclaw-gateway",
      headers: { "x-collector-token": "..." },
      traces: true,
      metrics: true,
      logs: true,
      sampleRate: 0.2, // 根 span 采样率，0.0 全部丢弃，1.0 全部保留
      flushIntervalMs: 60000, // 指标导出间隔，最小 1000ms
      captureContent: {
        enabled: false,
        inputMessages: false,
        outputMessages: false,
        toolInputs: false,
        toolOutputs: false,
        systemPrompt: false,
      },
    },
  },
}

环境变量

变量	作用
`OTEL_EXPORTER_OTLP_ENDPOINT`	覆盖 `diagnostics.otel.endpoint`。如果值已包含 `/v1/traces`、`/v1/metrics` 或 `/v1/logs`，则直接使用。
`OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` / `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` / `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`	信号特定的端点覆盖，仅当对应的 `diagnostics.otel.*Endpoint` 配置未设置时生效。优先级：信号级配置 > 信号级环境变量 > 共享端点。
`OTEL_SERVICE_NAME`	覆盖 `diagnostics.otel.serviceName`。
`OTEL_EXPORTER_OTLP_PROTOCOL`	覆盖传输协议（目前仅 `http/protobuf` 有效）。
`OTEL_SEMCONV_STABILITY_OPT_IN`	设为 `gen_ai_latest_experimental` 以使用最新的实验性 GenAI span 属性（`gen_ai.provider.name`）替代旧的 `gen_ai.system`。GenAI 指标始终使用有界、低基数的语义属性。
`OPENCLAW_OTEL_PRELOADED`	设为 `1` 表示其他预加载或宿主进程已注册全局 OpenTelemetry SDK。插件会跳过自己的 NodeSDK 生命周期，但仍然连接诊断监听器并尊重 `traces`/`metrics`/`logs` 开关。

隐私与内容捕获

默认情况下不导出原始模型/工具内容。Span 只携带有限标识符（channel、provider、model、错误分类、请求 ID 的哈希值），不包含提示词文本、响应文本、工具输入、工具输出或会话密钥。 Talk 指标只导出有界的事件元数据（mode、transport、provider、事件类型），不包含转录、音频 payload、会话 ID、turn ID、调用 ID、房间 ID 或 handoff token。

发出的模型请求可能包含 W3C traceparent 头。该头仅从 OpenClaw 自己诊断 trace 上下文的活跃模型调用生成。会替换调用方已有的 traceparent 头，因此插件或自定义 provider 选项无法伪造跨服务 trace 祖先。

只有在你确认 collector 和保留策略已批准用于提示词、响应、工具或系统提示文本时，才将 diagnostics.otel.captureContent.* 设为 true。每个子项独立启用：

inputMessages – 用户提示词内容。
outputMessages – 模型响应内容。
toolInputs – 工具参数 payload。
toolOutputs – 工具结果 payload。
systemPrompt – 组装好的 system/developer 提示。

启用任一子项后，模型和工具 span 会为该类添加有界、脱敏的 openclaw.content.* 属性。

采样与刷新

Traces：diagnostics.otel.sampleRate（仅作用于根 span，0.0 全部丢弃，1.0 全部保留）。
Metrics：diagnostics.otel.flushIntervalMs（最小 1000）。
Logs：OTLP 日志受 logging.level（文件日志级别）控制，使用诊断日志记录的脱敏路径，不使用控制台格式。高流量安装应优先使用 OTLP collector 的采样/过滤而非本地采样。
文件日志关联：JSONL 文件日志在日志调用携带有效诊断 trace 上下文时，会包含顶层 traceId、spanId、parentSpanId、traceFlags，使日志处理器可以将本地日志行与导出的 span 关联。
请求关联：Gateway HTTP 请求和 WebSocket 帧会创建内部请求 trace 范围。该范围内的日志和诊断事件默认继承该请求 trace，而智能体运行和模型调用 span 作为子 span 创建，使得 provider 的 traceparent 头保持在同一个 trace 上。

导出的指标

模型用量

openclaw.tokens（counter，attrs：openclaw.token、openclaw.channel、openclaw.provider、openclaw.model、openclaw.agent）
openclaw.cost.usd（counter，attrs：openclaw.channel、openclaw.provider、openclaw.model）
openclaw.run.duration_ms（histogram，attrs：openclaw.channel、openclaw.provider、openclaw.model）
openclaw.context.tokens（histogram，attrs：openclaw.context、openclaw.channel、openclaw.provider、openclaw.model）
gen_ai.client.token.usage（histogram，GenAI 语义约定指标，attrs：gen_ai.token.type = input/output、gen_ai.provider.name、gen_ai.operation.name、gen_ai.request.model）
gen_ai.client.operation.duration（histogram，秒，GenAI 语义约定指标，attrs：gen_ai.provider.name、gen_ai.operation.name、gen_ai.request.model，可选 error.type）
openclaw.model_call.duration_ms（histogram，attrs：openclaw.provider、openclaw.model、openclaw.api、openclaw.transport，以及分类错误时的 openclaw.errorCategory 和 openclaw.failureKind）
openclaw.model_call.request_bytes（histogram，最终模型请求 payload 的 UTF-8 字节大小；不含原始 payload 内容）
openclaw.model_call.response_bytes（histogram，流式模型响应事件的 UTF-8 字节大小；不含原始响应内容）
openclaw.model_call.time_to_first_byte_ms（histogram，到第一个流式响应事件的时间）

消息流

openclaw.webhook.received（counter，attrs：openclaw.channel、openclaw.webhook）
openclaw.webhook.error（counter，attrs：openclaw.channel、openclaw.webhook）
openclaw.webhook.duration_ms（histogram，attrs：openclaw.channel、openclaw.webhook）
openclaw.message.queued（counter，attrs：openclaw.channel、openclaw.source）
openclaw.message.received（counter，attrs：openclaw.channel、openclaw.source）
openclaw.message.dispatch.started（counter，attrs：openclaw.channel、openclaw.source）
openclaw.message.dispatch.completed（counter，attrs：openclaw.channel、openclaw.outcome、openclaw.reason、openclaw.source）
openclaw.message.dispatch.duration_ms（histogram，attrs：openclaw.channel、openclaw.outcome、openclaw.reason、openclaw.source）
openclaw.message.processed（counter，attrs：openclaw.channel、openclaw.outcome）
openclaw.message.duration_ms（histogram，attrs：openclaw.channel、openclaw.outcome）
openclaw.message.delivery.started（counter，attrs：openclaw.channel、openclaw.delivery.kind）
openclaw.message.delivery.duration_ms（histogram，attrs：openclaw.channel、openclaw.delivery.kind、openclaw.outcome、openclaw.errorCategory）

Talk

openclaw.talk.event（counter，attrs：openclaw.talk.event_type、openclaw.talk.mode、openclaw.talk.transport、openclaw.talk.brain、openclaw.talk.provider）
openclaw.talk.event.duration_ms（histogram，attrs：同 openclaw.talk.event；在 Talk 事件报告时长时发出）
openclaw.talk.audio.bytes（histogram，attrs：同 openclaw.talk.event；在 Talk 音频帧事件报告字节长度时发出）

队列与会话

openclaw.queue.lane.enqueue（counter，attrs：openclaw.lane）
openclaw.queue.lane.dequeue（counter，attrs：openclaw.lane）
openclaw.queue.depth（histogram，attrs：openclaw.lane 或 openclaw.channel=heartbeat）
openclaw.queue.wait_ms（histogram，attrs：openclaw.lane）
openclaw.session.state（counter，attrs：openclaw.state、openclaw.reason）
openclaw.session.stuck（counter，attrs：openclaw.state；仅在无活跃工作的陈旧会话记账时发出）
openclaw.session.stuck_age_ms（histogram，attrs：openclaw.state；仅在无活跃工作的陈旧会话记账时发出）
openclaw.session.turn.created（counter，attrs：openclaw.agent、openclaw.channel、openclaw.trigger）
openclaw.session.recovery.requested（counter，attrs：openclaw.state、openclaw.action、openclaw.active_work_kind、openclaw.reason）
openclaw.session.recovery.completed（counter，attrs：openclaw.state、openclaw.action、openclaw.status、openclaw.active_work_kind、openclaw.reason）
openclaw.session.recovery.age_ms（histogram，attrs：同对应的 recovery counter）
openclaw.run.attempt（counter，attrs：openclaw.attempt）

会话存活遥测

diagnostics.stuckSessionWarnMs 是会话存活诊断的无进展年龄阈值。OpenClaw 在观测到 reply、tool、status、block 或 ACP 运行时进展时，processing 状态的会话不会向该阈值老化。打字 keepalive 不计为进展，因此静默的模型或 harness 仍可被检测到。

OpenClaw 根据仍能观测到的活跃工作对会话进行分类：

session.long_running：活跃内嵌工作、模型调用或工具调用仍在进展。
session.stalled：存在活跃工作，但活跃运行未报告最近进展。停滞的内嵌运行最初保持只观测状态，若超过 diagnostics.stuckSessionAbortMs 仍未进展，则执行 abort-drain 以释放 lane 上的后续 turn。未设置时，abort 阈值默认为更安全的扩展窗口：至少 5 分钟且为 diagnostics.stuckSessionWarnMs 的 3 倍。
session.stuck：无活跃工作的陈旧会话记账，会立即释放受影响的会话 lane。

恢复会发出结构化的 session.recovery.requested 和 session.recovery.completed 事件。诊断会话状态仅在发生变更性恢复结果（aborted 或 released）且相同处理世代仍为当前时，才被标记为空闲。

只有 session.stuck 会发出 openclaw.session.stuck counter、openclaw.session.stuck_age_ms histogram 和 openclaw.session.stuck span。重复的 session.stuck 诊断在会话未变化时会有退避，因此仪表盘应对持续增长而非每次心跳信号进行告警。配置项和默认值见配置参考。

Harness 生命周期

openclaw.harness.duration_ms（histogram，attrs：openclaw.harness.id、openclaw.harness.plugin、openclaw.outcome，错误时有 openclaw.harness.phase）

Exec

openclaw.exec.duration_ms（histogram，attrs：openclaw.exec.target、openclaw.exec.mode、openclaw.outcome、openclaw.failureKind）

诊断内部（内存与工具循环）

openclaw.memory.heap_used_bytes（histogram，attrs：openclaw.memory.kind）
openclaw.memory.rss_bytes（histogram）
openclaw.memory.pressure（counter，attrs：openclaw.memory.level）
openclaw.tool.loop.iterations（counter，attrs：openclaw.toolName、openclaw.outcome）
openclaw.tool.loop.duration_ms（histogram，attrs：openclaw.toolName、openclaw.outcome）

导出的 Span

openclaw.model.usage
- openclaw.channel、openclaw.provider、openclaw.model
- openclaw.tokens.*（input/output/cache_read/cache_write/total）
- gen_ai.system（默认）或 gen_ai.provider.name（当选择最新 GenAI 语义约定时）
- gen_ai.request.model、gen_ai.operation.name、gen_ai.usage.*
openclaw.run
- openclaw.outcome、openclaw.channel、openclaw.provider、openclaw.model、openclaw.errorCategory
openclaw.model.call
- gen_ai.system（默认）或 gen_ai.provider.name（当选择最新 GenAI 语义约定时）
- gen_ai.request.model、gen_ai.operation.name、openclaw.provider、openclaw.model、openclaw.api、openclaw.transport
- 错误时有 openclaw.errorCategory 和可选的 openclaw.failureKind
- openclaw.model_call.request_bytes、openclaw.model_call.response_bytes、openclaw.model_call.time_to_first_byte_ms
- openclaw.provider.request_id_hash（基于 SHA 的上游 provider 请求 ID 哈希；不导出原始 ID）
openclaw.harness.run
- openclaw.harness.id、openclaw.harness.plugin、openclaw.outcome、openclaw.provider、openclaw.model、openclaw.channel
- 完成时：openclaw.harness.result_classification、openclaw.harness.yield_detected、openclaw.harness.items.started、openclaw.harness.items.completed、openclaw.harness.items.active
- 错误时：openclaw.harness.phase、openclaw.errorCategory、可选 openclaw.harness.cleanup_failed
openclaw.tool.execution
- gen_ai.tool.name、openclaw.toolName、openclaw.errorCategory、openclaw.tool.params.*
openclaw.exec
- openclaw.exec.target、openclaw.exec.mode、openclaw.outcome、openclaw.failureKind、openclaw.exec.command_length、openclaw.exec.exit_code、openclaw.exec.timed_out
openclaw.webhook.processed
- openclaw.channel、openclaw.webhook
openclaw.webhook.error
- openclaw.channel、openclaw.webhook、openclaw.error
openclaw.message.processed
- openclaw.channel、openclaw.outcome、openclaw.reason
openclaw.message.delivery
- openclaw.channel、openclaw.delivery.kind、openclaw.outcome、openclaw.errorCategory、openclaw.delivery.result_count
openclaw.session.stuck
- openclaw.state、openclaw.ageMs、openclaw.queueDepth
openclaw.context.assembled
- openclaw.prompt.size、openclaw.history.size、openclaw.context.tokens、openclaw.errorCategory（不含 prompt、history、response 或 session-key 内容）
openclaw.tool.loop
- openclaw.toolName、openclaw.outcome、openclaw.iterations、openclaw.errorCategory（不含循环消息、参数或工具输出）
openclaw.memory.pressure
- openclaw.memory.level、openclaw.memory.heap_used_bytes、openclaw.memory.rss_bytes

当显式启用内容捕获时，模型和工具 span 可能包含有界、脱敏的 openclaw.content.* 属性，仅针对你选择的特定内容类。

诊断事件目录

以下事件支撑上述指标和 span。插件也可直接订阅这些事件而无需 OTLP 导出。

模型用量

model.usage – token、费用、时长、上下文、provider/model/channel、会话 ID。usage 是 provider/turn 层面的计费和遥测；context.used 是当前 prompt/context 快照，当涉及缓存输入或工具循环调用时可能低于 provider 的 usage.total。

消息流

webhook.received / webhook.processed / webhook.error
message.queued / message.processed
message.delivery.started / message.delivery.completed / message.delivery.error

队列与会话

queue.lane.enqueue / queue.lane.dequeue
session.state / session.long_running / session.stalled / session.stuck
run.attempt / run.progress
diagnostic.heartbeat（聚合计数器：webhooks/queue/session）

Harness 生命周期

harness.run.started / harness.run.completed / harness.run.error – agent harness 的每次运行生命周期。包含 harnessId、可选 pluginId、provider/model/channel 和 run ID。完成时添加 durationMs、outcome、可选 resultClassification、yieldDetected 和 itemLifecycle 计数。错误时添加 phase（prepare/start/send/resolve/cleanup）、errorCategory 和可选 cleanupFailed。

Exec

exec.process.completed – 最终结果、时长、目标、mode、退出码和失败类型。命令文本和工作目录不包含在内。

不使用导出器

你可以保持诊断事件对插件或自定义 sink 可用，而不运行 diagnostics-otel：

json5

{
  diagnostics: { enabled: true },
}

用于针对性调试输出而不提高 logging.level，可使用诊断 flags。flags 不区分大小写，支持通配符（如 telegram.* 或 *）：

json5

{
  diagnostics: { flags: ["telegram.http"] },
}

或一次性环境变量覆盖：

bash

OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload openclaw gateway

flag 输出会进入标准日志文件（logging.file），并仍受 logging.redactSensitive 脱敏。完整指南：诊断 flags。

禁用

json5

{
  diagnostics: { otel: { enabled: false } },
}

也可以从 plugins.allow 中移除 diagnostics-otel，或运行 openclaw plugins disable diagnostics-otel。

常见问题

怎么配置 OpenTelemetry 端点？

在 diagnostics.otel.endpoint 中填写 collector 的 OTLP/HTTP 地址（如 http://otel-collector:4318），也可通过 tracesEndpoint、metricsEndpoint、logsEndpoint 分别设置信号特定端点。环境变量 OTEL_EXPORTER_OTLP_ENDPOINT 可覆盖共享端点。

导出的指标和 span 名称有哪些？

本页列出了所有 openclaw.* 和 gen_ai.* 指标名称（如 openclaw.tokens、openclaw.run.duration_ms）以及 span 名称（如 openclaw.model.call、openclaw.tool.execution）。每个都附带了属性说明，可用于构建 Grafana 仪表盘或告警规则。

怎么开启内容捕获而不泄露敏感数据？

谨慎操作：只有当你确认 collector 和保留策略已获批准时，才将 captureContent.inputMessages、outputMessages、toolInputs、toolOutputs、systemPrompt 设为 true。默认全部关闭，导出的 span 只包含脱敏哈希和元数据。启用后，对应类别的原始文本将以 openclaw.content.* 属性出现在 span 中。

OpenClaw 怎么配置 OpenTelemetry 导出（诊断指标、链路、日志） ​

架构概览 ​

快速开始 ​

导出的信号 ​

配置参考 ​

环境变量 ​

隐私与内容捕获 ​

采样与刷新 ​

导出的指标 ​

模型用量 ​

消息流 ​

Talk ​

队列与会话 ​

会话存活遥测 ​

Harness 生命周期 ​

Exec ​

诊断内部（内存与工具循环） ​

导出的 Span ​

诊断事件目录 ​

不使用导出器 ​

禁用 ​

相关 ​

常见问题 ​

怎么配置 OpenTelemetry 端点？ ​

导出的指标和 span 名称有哪些？ ​

怎么开启内容捕获而不泄露敏感数据？ ​

OpenClaw 怎么配置 OpenTelemetry 导出（诊断指标、链路、日志）

架构概览

快速开始

导出的信号

配置参考

环境变量

隐私与内容捕获

采样与刷新

导出的指标

模型用量

消息流

Talk

队列与会话

会话存活遥测

Harness 生命周期

Exec

诊断内部（内存与工具循环）

导出的 Span

诊断事件目录

不使用导出器

禁用

相关

常见问题

怎么配置 OpenTelemetry 端点？

导出的指标和 span 名称有哪些？

怎么开启内容捕获而不泄露敏感数据？