ChatTTS

Natural and Expressive Dialogue-Based Text-to-Speech

2024-05-26

ChatTTS is a state-of-the-art text-to-speech (TTS) model specifically designed for dialogue-based applications, such as LLM assistants. It excels in generating natural and expressive speech, making it ideal for conversational scenarios. The model supports multiple languages, including English and Chinese, with plans for additional languages in the future.

Key Features:

  • Conversational TTS: Optimized for dialogue tasks, enabling interactive and natural speech synthesis.
  • Fine-grained Control: Predicts and controls prosodic features like laughter, pauses, and interjections for more expressive outputs.
  • Multi-Speaker Support: Facilitates conversations with different speaker profiles.
  • High-Quality Prosody: Surpasses many open-source TTS models in prosody, with pre-trained models available for research and development.

Technical Highlights:

  • Trained on over 100,000 hours of Chinese and English audio data.
  • Open-source version includes a 40,000-hour pre-trained model without SFT (Supervised Fine-Tuning).
  • Supports streaming audio generation and multi-emotion control.
  • Includes DVAE encoder and zero-shot inference code for advanced use cases.

Ethical Considerations:

ChatTTS is released under the AGPLv3+ license for code and CC BY-NC 4.0 for the model, restricting it to academic and research use. To prevent misuse, the 40,000-hour model includes high-frequency noise and compressed audio quality (MP3 format). A detection model is also planned for future open-sourcing.

Installation and Usage:

  • Install via PyPI (pip install ChatTTS) or directly from GitHub.
  • Requires Python 3.11 and dependencies like torchaudio and safetensors.
  • Includes examples for web UI and command-line usage, with detailed instructions for fine-tuning prosodic features.

Performance:

  • Requires at least 4GB GPU memory for a 30-second clip.
  • Real-Time Factor (RTF) of ~0.3 on a 4090 GPU, generating ~7 semantic tokens per second.

Community and Support:

  • Join discussion groups for updates and support.
  • Contributions via GitHub issues/PRs are welcome.
  • For formal inquiries, contact open-source@2noise.com.

ChatTTS builds on advancements from projects like bark, XTTSv2, and valle, offering a powerful yet responsible tool for speech synthesis.

Text-to-Speech Conversational AI Generative Models Speech Synthesis Dialogue Systems