Skip to Content

OpenTelemetry GenAI Semantic Conventions 2026: End-to-End Implementation Guide for LLM & Agent Observability

Complete guide to implementing OpenTelemetry GenAI semantic conventions for LLM and AI agent observability with token metrics, tracing, and OTel Collector pipeline
Sk Jabedul Haque
May 6, 2026 5 min read 1078 views
OpenTelemetry GenAI Semantic Conventions 2026: End-to-End Implementation Guide for LLM & Agent Observability
Navigation
10 Sections

    OpenTelemetry GenAI semantic conventions define standardized gen_ai.* attributes for production-grade LLM and AI agent observability. These conventions (v1.37+) cover operation names, token metrics (input/output), latency measurements, and model metadata—enabling vendor-neutral tracing across all LLM providers. The GenAI SIG, formed in April 2024, maintains these conventions under CNCF oversight.

    What You Will Learn

    • GenAI semantic conventions — standardized gen_ai.* attributes for LLM spans, agent traces, and events
    • Token metrics monitoring — input/output tokens, cost calculation, and latency TTFT/TPS tracking
    • AI agent tracing — distributed tracing across multi-step workflows with context propagation
    • OTel Collector pipeline — configuration for GenAI telemetry with processors and exporters
    • Integration backends — Datadog, Grafana Tempo, Jaeger for visualizing GenAI traces

    Understanding GenAI Semantic Conventions

    The OpenTelemetry Generative AI Special Interest Group (GenAI SIG) was formed in April 2024 to standardize how LLM and AI agent operations are observed. These semantic conventions provide a vendor-neutral schema for describing AI operations—so your telemetry data stays consistent whether you use OpenAI, Anthropic, Google Gemini, or any other provider.

    As of early 2026, the GenAI conventions cover four primary areas: LLM client spans for direct API calls, agent spans for multi-step workflows, events for capturing prompt/completion content, and metrics for aggregated measurements. Version 1.37+ became the stable baseline, with the transition to version 1.41.0 introducing reasoning tokens and enhanced agent attributes.

    Core gen_ai.* Attributes

    Attribute Type Description Example
    gen_ai.operation.name string Type of LLM operation chat, text_completion
    gen_ai.request.model string Model name for request gpt-4o, claude-sonnet-4-20250514
    gen_ai.response.model string Model that generated response gpt-4o-2025-01-27
    gen_ai.provider.name string LLM provider identifier openai, anthropic, gcp.vertex_ai
    gen_ai.usage.input_tokens int Prompt token count 150
    gen_ai.usage.output_tokens int Completion token count 200
    gen_ai.usage.reasoning.output_tokens int Reasoning chain tokens (v1.41+) 512

    Token Metrics and Cost Monitoring

    Token monitoring is the foundation of LLM cost observability. Since LLMs price by token—differently for input and output—tracking these separately is essential. OpenTelemetry captures token usage through gen_ai.usage.input_tokens and gen_ai.usage.output_tokens, enabling trace-level cost transparency.

    To calculate cost, multiply token counts by your model pricing rates. For example, OpenAI GPT-4o pricing (as of May 2026) is approximately $2.50/1M input tokens and $10/1M output tokens. Emit a custom gen_ai.usage.cost_usd metric using a pricing table you control.

    $2.50 GPT-4o Input / 1M tokens
    $10 GPT-4o Output / 1M tokens
    $0.15 GPT-4o Mini Input / 1M tokens

    Latency Metrics: TTFT and TPS

    For streaming responses, Time to First Token (TTFT) and Tokens Per Second (TPS) matter more than total latency. TTFT measures perceived responsiveness—users see the first output faster with lower TTFT. TPS measures generation speed once streaming begins.

    The standard GenAI metric instruments include gen_ai.client.operation.duration for end-to-end latency and gen_ai.client.time_per_output_token for TPS. Recommended SLO dimensions: P50 latency under 500ms, P99 under 5s for chat; P99 total generation under 30s.

    AI Agent Distributed Tracing

    Agentic AI introduces complexity that traditional APM cannot handle. A single user request may spawn multiple model calls, tool executions, and retrieval steps. OpenTelemetry handles this via W3C TraceContext headers (traceparent, tracestate), which propagate across HTTP calls automatically.

    For queue-based communication, you must serialize the context into the message. The agent spans use gen_ai.agent.* attributes including gen_ai.agent.name, gen_ai.agent.type, and gen_ai.tool.name for tool invocations. This enables full trace visualization across the agent workflow.

    Professional Recommendation

    Use auto-instrumentation libraries for LangChain, LangGraph, CrewAI, or OpenAI Agents SDK whenever possible. These libraries handle context propagation and span creation automatically, ensuring consistent distributed traces without custom code.

    OTel Collector Pipeline for GenAI

    The OpenTelemetry Collector provides a vendor-neutral telemetry pipeline that receives, processes, and exports data. For GenAI workloads, the collector normalizes attributes from different instrumentation libraries (OpenInference, OpenLLMetry) to the standard gen_ai.* conventions.

    Collector Configuration

    Configure the OTel Collector with the genainormalizer processor to transform attributes from popular frameworks to GenAI semantic conventions:

    processors:
      genainormalizer:
        semconv_version: "1.41.0"
        profiles: [openinference, openllmetry]
        remove_originals: true
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [genainormalizer, batch]
          exporters: [otlp/backend]

    The genainormalizer processor maps attributes from OpenInference and OpenLLMetry libraries (covering 60+ frameworks including LangChain, CrewAI, PydanticAI, Strands) to the OTel GenAI semantic conventions. It supports custom mappings and optional overwrite behavior.

    Integration: Datadog and Grafana

    Datadog LLM Observability natively supports OpenTelemetry GenAI Semantic Conventions (v1.37+). This allows you to instrument once with OTel, export via your existing collector pipeline, and analyze GenAI spans directly in Datadog—no code changes required.

    Similarly, Grafana Cloud Traces (powered by Tempo) provides full distributed tracing visualization. You can correlate agent traces with metrics and logs for faster root cause analysis. The OpenTelemetry Python instrumentation for OpenAI Agents SDK automatically emits GenAI-compliant spans.

    Platform Best For Setup Complexity
    Datadog LLM Observability Enterprise monitoring with cost analysis Low — uses existing OTel Collector
    Grafana Tempo Open-source tracing with metrics correlation Medium — self-hosted options
    Jaeger Lightweight development/testing Low — single binary
    Honeycomb Query-focused debugging Medium — requires cloud account

    Implementation Checklist

    1

    Add gen_ai.* attributes to spans

    Configure your LLM SDK or auto-instrumentation to emit gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens on every span.

    2

    Set up OTel Collector with genainormalizer

    Deploy an OTel Collector that receives OTLP data, normalizes attributes to GenAI semconv v1.41.0, and exports to your backend.

    3

    Configure metrics and cost tracking

    Create histograms for gen_ai.client.operation.duration to capture latency percentiles. Emit gen_ai.usage.cost_usd using your model pricing table.

    4

    Connect observability backend

    Export to Datadog, Grafana Tempo, Jaeger, or Honeycomb. Verify traces appear with correct gen_ai.* attributes.

    5

    Set SLOs and alerts

    Define latency thresholds (P50 < 500ms, P99 < 5s), error rate limits (1-2%), and cost alerts when token usage spikes 2x baseline.

    Final Verdict

    OpenTelemetry GenAI semantic conventions provide the vendor-neutral standard that production AI systems need. By adopting gen_ai.* attributes in v1.37+ (or v1.41.0 for reasoning tokens), teams achieve consistent observability across providers, enabling cost tracking, latency analysis, and distributed tracing for AI agents. The OTel Collector pipeline with genainormalizer ensures compatibility across 60+ frameworks—making this the foundation for enterprise-grade LLM and agent observability in 2026.

    Last Updated: May 06, 2026 | Source: OpenTelemetry Official Documentation (opentelemetry.io)

    Frequently Asked Questions

    GenAI semantic conventions are standardized gen_ai.* attributes for LLM and AI agent observability maintained by the OpenTelemetry GenAI SIG (formed April 2024). They provide vendor-neutral schema covering operation names, token metrics, latency, and model metadata.
    Key attributes include gen_ai.operation.name (chat, text_completion), gen_ai.request.model (model name), gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.reasoning.output_tokens (v1.41+), and gen_ai.provider.name (openai, anthropic, gcp.vertex_ai).
    Track token usage via gen_ai.usage.input_tokens and gen_ai.usage.output_tokens separately (different pricing). Calculate cost by multiplying tokens by model rates. Emit gen_ai.usage.cost_usd using your pricing table. Latency tracked via gen_ai.client.operation.duration.
    Time to First Token (TTFT) measures perceived responsiveness from request to first token. Tokens Per Second (TPS) measures generation speed. For streaming, TTFT matters more than total latency. Recommended: P50 TTFT < 500ms, P99 < 5s for chat.
    Use W3C TraceContext headers (traceparent, tracestate) for distributed tracing across multi-step agent workflows. Agent spans use gen_ai.agent.name, gen_ai.agent.type, and gen_ai.tool.name attributes. Auto-instrumentation for LangChain, LangGraph, and OpenAI Agents SDK handles this automatically.
    Configure the OTel Collector with genainormalizer processor to map attributes from OpenInference/OpenLLMetry libraries (60+ frameworks) to GenAI semconv v1.41.0. Set semconv_version, profiles, and remove_originals options.
    Datadog LLM Observability natively supports OTel GenAI Semantic Conventions v1.37+. Instrument once with OTel, export via existing collector pipeline, analyze spans directly—no code changes required. Grafana Tempo and Jaeger also support GenAI traces.
    Sk Jabedul Haque

    Sk Jabedul Haque

    Founder & Chief Editor

    Building India's most trusted finance education platform — simplifying news, calculators, and market trends so anyone can understand and invest confidently.