Multi-modality logging and visualization is currently only available when using Traceloop as your observability backend. Support for other platforms may be added in the future.
What is Multi-Modality Support?
Multi-modality support means that OpenLLMetry automatically detects and logs all types of content in your LLM requests and responses:- Images - Vision model inputs, generated images, screenshots, diagrams
- Audio - Speech-to-text inputs, text-to-speech outputs, audio analysis
- Video - Video analysis, frame extraction, video understanding
- Documents - PDFs, presentations, structured documents
- Mixed content - Combinations of text, images, audio in a single request
How It Works
OpenLLMetry instruments supported LLM SDKs to detect multi-modal content in API calls. When multi-modal data is present, it:- Captures the content - Extracts images, audio, video, and other media from requests
- Logs metadata - Records content types, sizes, formats, and relationships
- Preserves context - Maintains the full conversation flow with all modalities
- Enables visualization - Makes content viewable in the Traceloop dashboard
Supported Models and Frameworks
Multi-modality logging works with any LLM provider and framework that OpenLLMetry instruments. Common examples include:Vision Models
- OpenAI GPT-4 Vision - Image understanding and analysis
- Anthropic Claude 3 - Image, document, and chart analysis
- Google Gemini - Multi-modal understanding across images, video, and audio
- Azure OpenAI - Vision-enabled models
Audio Models
- OpenAI Whisper - Speech-to-text transcription
- OpenAI TTS - Text-to-speech generation
- ElevenLabs - Voice synthesis and cloning
Multi-Modal Frameworks
- LangChain - Multi-modal chains and agents
- LlamaIndex - Multi-modal document indexing and retrieval
- Framework-agnostic - Direct API calls to any provider
Usage Examples
Multi-modality logging is automatic. Simply use your LLM provider as normal:Image Analysis with OpenAI
- Python
- TypeScript
Image Analysis with Base64
You can also send images as base64-encoded data:Multi-Image Analysis
Analyze multiple images in a single request:Audio Transcription
Text-to-Speech
Multi-Modal with Anthropic Claude
Using with LangChain
Multi-modality logging works seamlessly with LangChain:Viewing Multi-Modal Content in Traceloop
When you view traces in the Traceloop dashboard:- Navigate to your trace - Find the specific LLM call in your traces
- View the conversation - See the full context including all modalities
- Inspect media content - Click on images, audio, or video to view them inline
- Analyze relationships - Understand how different content types interact
- Debug issues - Identify problems with content formatting or model responses
Privacy and Content Control
Multi-modal content may include sensitive or proprietary information. You have full control over what gets logged:Disable Content Tracing
To prevent logging of any content (including multi-modal data):Selective Content Filtering
For more granular control, you can filter specific types of content or implement custom redaction logic. See our Privacy documentation for detailed options.Best Practices
Storage and Performance
Multi-modal content can be large. Consider these best practices:- Monitor storage usage - Large images and audio files increase trace storage requirements
- Use appropriate image sizes - Resize images before sending to LLMs when possible
- Consider content tracing settings - Disable content logging in high-volume production environments if not needed
- Review retention policies - Configure appropriate data retention in your Traceloop settings
Debugging Multi-Modal Applications
Multi-modality logging is particularly valuable for:- Image quality issues - See exactly what images were sent to the model
- Format problems - Verify that content is properly encoded and transmitted
- Model behavior - Understand how models respond to different types of content
- User experience - Review actual user-submitted content to improve handling
- Compliance - Audit what content is being processed by your application
Security Considerations
When logging multi-modal content:- Review data policies - Ensure compliance with data protection regulations
- Filter sensitive content - Don’t log PII, confidential documents, or sensitive images
- Access controls - Limit who can view traces with multi-modal content
- Encryption - Traceloop encrypts all data in transit and at rest
- Retention - Set appropriate retention periods for multi-modal traces
Limitations
Current limitations of multi-modality support:- Traceloop only - Multi-modal visualization is currently exclusive to the Traceloop platform. When exporting to other observability tools (Datadog, Honeycomb, etc.), multi-modal content metadata is logged but visualization is not available.
- Storage limits - Very large media files (>10MB) may be truncated or linked rather than embedded
- Format support - Common formats (JPEG, PNG, MP3, MP4, PDF) are fully supported; exotic formats may have limited visualization
Supported Content Types
OpenLLMetry automatically detects and logs these content types:| Content Type | Format Examples | Visualization |
|---|---|---|
| Images | JPEG, PNG, GIF, WebP, SVG | Inline preview |
| Audio | MP3, WAV, OGG, M4A | Playback controls |
| Video | MP4, WebM, MOV | Video player |
| Documents | PDF, DOCX (when supported by model) | Document viewer |
| Base64 Encoded | Any of the above as data URIs | Automatic decoding |
Next Steps
- Learn about privacy controls for multi-modal content
- Explore supported models and frameworks
- Set up workflow annotations for complex multi-modal pipelines
- Configure Traceloop integration to enable multi-modal visualization

