Zyphra Zonos

Highly expressive TTS model with high fidelity voice cloning

2025-02-11

Zonos offers flexible control of vocal speed, emotion, tone, and audio quality as well as instant unlimited high quality voice cloning. Zonos natively generates speech at 44Khz. Our hybrid is the first open-source SSM hybrid audio model.

Zyphra Zonos is a cutting-edge text-to-speech (TTS) model offering high-fidelity voice cloning and expressive speech generation. It features two 1.6B models—a transformer and an SSM hybrid—released under the Apache 2.0 license. Zonos allows precise control over vocal speed, tone, emotion, and audio quality, producing natural, 44KHz speech. Trained on 200,000 hours of multilingual data, it excels in English and supports other languages like Chinese and Spanish. The hybrid model, powered by Mamba2 architecture, reduces latency and memory usage. Zonos is accessible via API, playground, and Huggingface, advancing TTS research with its open-source approach and high-quality outputs.

Product Website

Product Hunt

Open Source Artificial Intelligence GitHub Audio

Zyphra Zonos

Highly expressive TTS model with high fidelity voice cloning

Thoughtflow

One Shot LoRA