VoiceCraft

VoiceCraft is an advanced neural codec language model designed for speech editing and zero-shot text-to-speech (TTS) applications. It achieves state-of-the-art performance on diverse, real-world audio data such as audiobooks, internet videos, and podcasts. With just a few seconds of reference audio, VoiceCraft can clone or edit an unseen voice, making it highly versatile for various use cases.

Key Features:

High Flexibility: Supports multiple inference methods including Google Colab, Docker, and standalone scripts.
Enhanced Models: Includes 330M/830M TTS enhanced models for improved performance.
Ease of Use: Offers Gradio interfaces on HuggingFace Spaces and detailed Colab notebooks for quick testing.
Training Support: Provides comprehensive guidance for training and fine-tuning custom datasets.

Applications:

Speech Editing: Modify existing speech recordings with precision.
Zero-shot TTS: Generate natural-sounding speech from text without prior training on the target voice.
Long TTS Mode: Efficiently handle long texts for TTS applications.

Technical Highlights:

Utilizes Encodec for audio encoding and phonemization for text processing.
Supports custom datasets with detailed steps for data preparation and model training.
Compatible with CUDA-enabled GPUs for accelerated performance.

Licensing:

Codebase: CC BY-NC-SA 4.0
Model Weights: Coqui Public Model License 1.0.0

VoiceCraft is a powerful tool for developers and researchers working on speech synthesis and editing, offering cutting-edge performance with user-friendly interfaces.

VoiceCraft

State-of-the-art speech editing and zero-shot text-to-speech in the wild

Key Features:

Applications:

Technical Highlights:

Licensing:

Mojo

MoneyPrinterTurbo