Sesame

Conversational speech model that achieves voice presence

2025-03-05

Sesame
Sesame's Conversational Speech Model (CSM) creates AI voices that go beyond text-to-speech, aiming for truly natural and engaging conversations.
Sesame introduces a Conversational Speech Model (CSM) designed to create AI voices that transcend traditional text-to-speech, aiming for natural, emotionally intelligent conversations. By focusing on 'voice presence,' Sesame’s AI companions engage users with nuanced tone, rhythm, and context-awareness, fostering genuine dialogue and trust. The CSM leverages multimodal learning and transformer-based architectures to produce coherent, expressive speech in real time. While current models excel in naturalness, challenges remain in fully replicating human conversational dynamics. Sesame is committed to open-sourcing its work and expanding multilingual capabilities, paving the way for more immersive and intuitive voice interfaces.
Open Source Artificial Intelligence Audio