Voxtral

Frontier open source speech understanding models

2025-07-16

Voxtral
Voxtral by Mistral AI is a new family of open-source speech understanding models. Available in 24B and 3B sizes, it goes beyond transcription to offer Q&A, summarization, and function calling directly from voice with SOTA performance.
Voxtral by Mistral AI is a groundbreaking open-source speech understanding model family, available in 24B and 3B variants. It excels beyond basic transcription, offering advanced features like Q&A, summarization, and function calling directly from voice. With state-of-the-art performance, Voxtral supports multilingual processing, long-form audio up to 40 minutes, and seamless integration for production and edge deployments. Designed for affordability and flexibility, Voxtral outperforms competitors like OpenAI Whisper and ElevenLabs Scribe at a fraction of the cost. It combines high accuracy with deep semantic understanding, making it ideal for real-world applications such as customer support, analytics, and voice-driven workflows. Available via API or local download, Voxtral democratizes advanced speech intelligence for developers and enterprises alike. Upcoming enhancements include speaker segmentation, emotion detection, and non-speech audio recognition, further expanding its capabilities. Whether for prototyping or large-scale deployment, Voxtral delivers open, production-ready voice interaction solutions.
Open Source Artificial Intelligence Audio