Observability

Overview

Whispey Observability provides comprehensive monitoring for LiveKit voice agents, capturing detailed metrics and telemetry data from every conversation turn. It tracks the complete pipeline from speech-to-text through language model processing to text-to-speech output.

Core Components

Conversation Turn Tracking

Every user-agent interaction is captured as a structured turn containing:

User transcript with STT processing metrics
Agent response with LLM and TTS metrics
Performance data including latency and costs
Configuration details for all pipeline components

Pipeline Configuration Capture

Automatically extracts complete configuration from your voice pipeline:

STT Configuration: Model, language, sample rate, interim results
LLM Configuration: Model, temperature, max tokens, provider settings
TTS Configuration: Voice ID, model, voice settings, speed
VAD Configuration: Activation thresholds, speech duration settings

OpenTelemetry Integration

When enabled with enable_otel=True, captures comprehensive telemetry spans:

STT Spans: Audio processing with duration and model info
LLM Spans: Token usage, latency, and request details
TTS Spans: Character counts, synthesis timing
Tool Spans: Function executions with performance metrics

SDK Integration

from whispey import LivekitObserve

# Initialize with observability
whispey = LivekitObserve(
    agent_id="your-agent-id",
    apikey="your-api-key",
    enable_otel=True  # Enable telemetry capture
)

# Start monitoring session
session_id = whispey.start_session(session)

# Export data on shutdown
async def whispey_shutdown():
    await whispey.export(session_id)

ctx.add_shutdown_callback(whispey_shutdown)

Dashboard Features

TracesTable Component

Main interface showing conversation turns with:

Turn-by-turn view of all conversations
Status indicators (success/warning/error)
Performance metrics (duration, cost, operations)
Search and filtering capabilities
Real-time updates as conversations happen

Enhanced Trace Detail Sheet

Detailed analysis of individual turns:

Pipeline Flow View: Visual STT → LLM → TTS representation
Complete Prompt Context: Full system instructions and conversation history
Tool Executions: Function calls with arguments and results
Cost Breakdown: Per-operation pricing analysis
Configuration Details: Complete model and parameter settings

Performance Monitoring

Key Metrics

STT Metrics: Audio duration, processing time, transcription accuracy
LLM Metrics: Token usage, time to first token, generation speed
TTS Metrics: Character count, time to first byte, audio duration
Overall Latency: End-to-end conversation response time

Cost Tracking

Dynamic pricing calculation for:

OpenAI Models: GPT-4o, GPT-4o-mini, Whisper, TTS voices
Anthropic Models: Claude 3.5 Sonnet, Haiku
ElevenLabs Voices: Premium voice synthesis
Google Models: Gemini Pro, Flash
Other Providers: Deepgram, Azure, Cartesia

Tool Call Monitoring

Automatic tracking of function tool executions:

{
    "name": "get_weather",
    "arguments": {"location": "San Francisco"},
    "execution_duration_ms": 1250,
    "status": "success",
    "result": "The weather is 72°F and sunny",
    "result_length": 42
}

Enhanced Data Collection

Beyond basic metrics, captures:

Model Detection: Automatic identification of providers and versions
Voice Configuration: Complete TTS voice settings and parameters
Conversation Context: Full chat history sent to language models
State Transitions: User and agent state changes during conversations
Error Handling: Detailed error information and failure modes

Export and Analysis

Session data export includes:

Complete conversation turns with all metrics
Telemetry spans for detailed performance analysis
Configuration snapshots for reproducibility
Cost calculations with provider-specific pricing
Performance summaries and aggregate statistics

The exported data integrates with the Whispey dashboard for visualization, analysis, and long-term tracking of voice agent performance.

Configuration Options

Basic Observability

whispey = LivekitObserve(
    agent_id="your-agent-id",
    apikey="your-api-key",
    enable_otel=True  # Enable OpenTelemetry
)

session_id = whispey.start_session(session)

# Export on shutdown
async def whispey_shutdown():
    await whispey.export(session_id)

ctx.add_shutdown_callback(whispey_shutdown)

Data Structure

Turn Data Format

{
    "turn_id": "turn_123",
    "timestamp": "2024-01-15T10:30:00Z",
    "user_transcript": "What's the weather like?",
    "agent_response": "The weather is sunny and 75°F",
    "stt_metrics": {
        "audio_duration_ms": 1500,
        "processing_time_ms": 800,
        "model": "nova-3",
        "language": "en"
    },
    "llm_metrics": {
        "input_tokens": 45,
        "output_tokens": 12,
        "total_tokens": 57,
        "time_to_first_token_ms": 1200,
        "model": "gpt-4o-mini",
        "temperature": 0.7
    },
    "tts_metrics": {
        "character_count": 32,
        "time_to_first_byte_ms": 500,
        "audio_duration_ms": 2000,
        "voice_id": "H8bdWZHK2OgZwTN7ponr",
        "model": "eleven_flash_v2_5"
    },
    "tool_executions": [
        {
            "name": "get_weather",
            "arguments": {"location": "San Francisco"},
            "execution_duration_ms": 1250,
            "status": "success",
            "result": "The weather is 72°F and sunny"
        }
    ],
    "costs": {
        "stt_cost": 0.0015,
        "llm_cost": 0.0008,
        "tts_cost": 0.0020,
        "total_cost": 0.0043
    },
    "status": "success"
}

Telemetry Spans

When OpenTelemetry is enabled, spans are created for each operation:

{
    "span_id": "span_456",
    "trace_id": "trace_789",
    "operation_name": "stt_processing",
    "start_time": "2024-01-15T10:30:00.123Z",
    "end_time": "2024-01-15T10:30:00.923Z",
    "duration_ms": 800,
    "attributes": {
        "model": "nova-3",
        "language": "en",
        "audio_duration_ms": 1500,
        "provider": "deepgram"
    }
}

Use Cases

Performance Optimization

Identify bottlenecks in the voice pipeline
Optimize model selection based on cost and performance
Monitor response times and set up alerts
Track error rates and improve reliability

Cost Management

Monitor spending across different providers
Optimize token usage for cost efficiency
Compare provider costs for the same functionality
Set up cost alerts and budgets

Quality Assurance

Track conversation quality metrics
Monitor transcription accuracy
Analyze user satisfaction patterns
Identify improvement opportunities

Debugging and Troubleshooting

Trace conversation flow through the pipeline
Debug tool execution issues
Analyze error patterns and root causes
Reproduce issues with complete context

Important Notes

Export on Shutdown: All observability data is only exported when the session ends via the shutdown callback
Real-time Collection: Metrics and telemetry are collected in real-time during the conversation
OpenTelemetry Optional: OpenTelemetry integration is optional and must be explicitly enabled
Dashboard Integration: Exported data automatically appears in the Whispey dashboard for analysis

Next Steps

Learn about Advanced Features
Check out Examples for real-world usage
Visit our GitHub Examples Repository

Observability

On this page