MARS5 TTS
Open-source, insanely prosodic text-to-speech model
2024-06-14

MARS5 an opensource TTS model to replicate performances (from 2-3s of audio reference) in 140+ languages, even for extremely tough prosodic scenarios like sports commentary, movies, anime & more. Join our Discord https://discord.gg/4GVdQ28cZC today!
MARS5 TTS is an open-source text-to-speech model designed to replicate voice performances with remarkable prosody across 140+ languages. It excels in challenging scenarios like sports commentary, movies, and anime, requiring only 2-3 seconds of audio reference. The model features a two-stage AR-NAR pipeline, enabling high-quality speech synthesis with minimal input. Users can guide prosody using punctuation and capitalization, and achieve enhanced results with a "deep clone" by providing a reference transcript. MARS5 supports fast, shallow cloning for quick results or deeper, higher-quality cloning for nuanced outputs. With easy installation via pip and Docker, it’s accessible for diverse applications. CAMB.AI continues to refine the model, inviting contributions to improve stability, speed, and performance. Join the community on Discord to explore its potential.
Software Engineering
Artificial Intelligence
GitHub
Data Science