Whispey Documentation

Observability

Overview

Whispey Observability provides comprehensive monitoring for LiveKit voice agents, capturing detailed metrics and telemetry data from every conversation turn. It tracks the complete pipeline from speech-to-text through language model processing to text-to-speech output.

Core Components

Conversation Turn Tracking

Every user-agent interaction is captured as a structured turn containing:

  • User transcript with STT processing metrics
  • Agent response with LLM and TTS metrics
  • Performance data including latency and costs
  • Configuration details for all pipeline components

Pipeline Configuration Capture

Automatically extracts complete configuration from your voice pipeline:

  • STT Configuration: Model, language, sample rate, interim results
  • LLM Configuration: Model, temperature, max tokens, provider settings
  • TTS Configuration: Voice ID, model, voice settings, speed
  • VAD Configuration: Activation thresholds, speech duration settings

OpenTelemetry Integration

When enabled with enable_otel=True, captures comprehensive telemetry spans:

  • STT Spans: Audio processing with duration and model info
  • LLM Spans: Token usage, latency, and request details
  • TTS Spans: Character counts, synthesis timing
  • Tool Spans: Function executions with performance metrics

SDK Integration

from whispey import LivekitObserve

# Initialize with observability
whispey = LivekitObserve(
    agent_id="your-agent-id",
    apikey="your-api-key",
    enable_otel=True  # Enable telemetry capture
)

# Start monitoring session
session_id = whispey.start_session(session)

# Export data on shutdown
async def whispey_shutdown():
    await whispey.export(session_id)

ctx.add_shutdown_callback(whispey_shutdown)

Dashboard Features

TracesTable Component

Main interface showing conversation turns with:

  • Turn-by-turn view of all conversations
  • Status indicators (success/warning/error)
  • Performance metrics (duration, cost, operations)
  • Search and filtering capabilities
  • Real-time updates as conversations happen

Enhanced Trace Detail Sheet

Detailed analysis of individual turns:

  • Pipeline Flow View: Visual STT → LLM → TTS representation
  • Complete Prompt Context: Full system instructions and conversation history
  • Tool Executions: Function calls with arguments and results
  • Cost Breakdown: Per-operation pricing analysis
  • Configuration Details: Complete model and parameter settings

Performance Monitoring

Key Metrics

  • STT Metrics: Audio duration, processing time, transcription accuracy
  • LLM Metrics: Token usage, time to first token, generation speed
  • TTS Metrics: Character count, time to first byte, audio duration
  • Overall Latency: End-to-end conversation response time

Cost Tracking

Dynamic pricing calculation for:

  • OpenAI Models: GPT-4o, GPT-4o-mini, Whisper, TTS voices
  • Anthropic Models: Claude 3.5 Sonnet, Haiku
  • ElevenLabs Voices: Premium voice synthesis
  • Google Models: Gemini Pro, Flash
  • Other Providers: Deepgram, Azure, Cartesia

Tool Call Monitoring

Automatic tracking of function tool executions:

{
    "name": "get_weather",
    "arguments": {"location": "San Francisco"},
    "execution_duration_ms": 1250,
    "status": "success",
    "result": "The weather is 72°F and sunny",
    "result_length": 42
}

Enhanced Data Collection

Beyond basic metrics, captures:

  • Model Detection: Automatic identification of providers and versions
  • Voice Configuration: Complete TTS voice settings and parameters
  • Conversation Context: Full chat history sent to language models
  • State Transitions: User and agent state changes during conversations
  • Error Handling: Detailed error information and failure modes

Export and Analysis

Session data export includes:

  • Complete conversation turns with all metrics
  • Telemetry spans for detailed performance analysis
  • Configuration snapshots for reproducibility
  • Cost calculations with provider-specific pricing
  • Performance summaries and aggregate statistics

The exported data integrates with the Whispey dashboard for visualization, analysis, and long-term tracking of voice agent performance.

Configuration Options

Basic Observability

whispey = LivekitObserve(
    agent_id="your-agent-id",
    apikey="your-api-key",
    enable_otel=True  # Enable OpenTelemetry
)

session_id = whispey.start_session(session)

# Export on shutdown
async def whispey_shutdown():
    await whispey.export(session_id)

ctx.add_shutdown_callback(whispey_shutdown)

Data Structure

Turn Data Format

{
    "turn_id": "turn_123",
    "timestamp": "2024-01-15T10:30:00Z",
    "user_transcript": "What's the weather like?",
    "agent_response": "The weather is sunny and 75°F",
    "stt_metrics": {
        "audio_duration_ms": 1500,
        "processing_time_ms": 800,
        "model": "nova-3",
        "language": "en"
    },
    "llm_metrics": {
        "input_tokens": 45,
        "output_tokens": 12,
        "total_tokens": 57,
        "time_to_first_token_ms": 1200,
        "model": "gpt-4o-mini",
        "temperature": 0.7
    },
    "tts_metrics": {
        "character_count": 32,
        "time_to_first_byte_ms": 500,
        "audio_duration_ms": 2000,
        "voice_id": "H8bdWZHK2OgZwTN7ponr",
        "model": "eleven_flash_v2_5"
    },
    "tool_executions": [
        {
            "name": "get_weather",
            "arguments": {"location": "San Francisco"},
            "execution_duration_ms": 1250,
            "status": "success",
            "result": "The weather is 72°F and sunny"
        }
    ],
    "costs": {
        "stt_cost": 0.0015,
        "llm_cost": 0.0008,
        "tts_cost": 0.0020,
        "total_cost": 0.0043
    },
    "status": "success"
}

Telemetry Spans

When OpenTelemetry is enabled, spans are created for each operation:

{
    "span_id": "span_456",
    "trace_id": "trace_789",
    "operation_name": "stt_processing",
    "start_time": "2024-01-15T10:30:00.123Z",
    "end_time": "2024-01-15T10:30:00.923Z",
    "duration_ms": 800,
    "attributes": {
        "model": "nova-3",
        "language": "en",
        "audio_duration_ms": 1500,
        "provider": "deepgram"
    }
}

Use Cases

Performance Optimization

  • Identify bottlenecks in the voice pipeline
  • Optimize model selection based on cost and performance
  • Monitor response times and set up alerts
  • Track error rates and improve reliability

Cost Management

  • Monitor spending across different providers
  • Optimize token usage for cost efficiency
  • Compare provider costs for the same functionality
  • Set up cost alerts and budgets

Quality Assurance

  • Track conversation quality metrics
  • Monitor transcription accuracy
  • Analyze user satisfaction patterns
  • Identify improvement opportunities

Debugging and Troubleshooting

  • Trace conversation flow through the pipeline
  • Debug tool execution issues
  • Analyze error patterns and root causes
  • Reproduce issues with complete context

Important Notes

  • Export on Shutdown: All observability data is only exported when the session ends via the shutdown callback
  • Real-time Collection: Metrics and telemetry are collected in real-time during the conversation
  • OpenTelemetry Optional: OpenTelemetry integration is optional and must be explicitly enabled
  • Dashboard Integration: Exported data automatically appears in the Whispey dashboard for analysis

Next Steps