Microsoft AI (MAI) Voice-1
Highly expressive and natural speech generation model
2025-08-29

MAI-Voice-1 is a lightning-fast speech generation model, with an ability to generate a full minute of audio in under a second on a single GPU, making it one of the most efficient speech systems available today.
Microsoft AI (MAI) Voice-1 is a highly expressive and efficient speech generation model capable of producing a full minute of natural-sounding audio in under a second using just one GPU. It powers features like Copilot Daily and Podcasts, and is now available in Copilot Labs for users to experiment with personalized storytelling and guided meditations. Designed for both single and multi-speaker scenarios, the model delivers high-fidelity voice output, positioning itself as a key interface for future AI companions. Microsoft is also previewing MAI-1, a versatile text-based model, as part of its broader effort to create accessible, responsible AI that serves diverse user needs globally.
Artificial Intelligence
Audio