Sesame

Conversational speech model that achieves voice presence

2025-03-05

Sesame's Conversational Speech Model (CSM) creates AI voices that go beyond text-to-speech, aiming for truly natural and engaging conversations.

Sesame introduces a Conversational Speech Model (CSM) designed to create AI voices that transcend traditional text-to-speech, aiming for natural, emotionally intelligent conversations. By focusing on 'voice presence,' Sesame’s AI companions engage users with nuanced tone, rhythm, and context-awareness, fostering genuine dialogue and trust. The CSM leverages multimodal learning and transformer-based architectures to produce coherent, expressive speech in real time. While current models excel in naturalness, challenges remain in fully replicating human conversational dynamics. Sesame is committed to open-sourcing its work and expanding multilingual capabilities, paving the way for more immersive and intuitive voice interfaces.

Product Website

Product Hunt

Open Source Artificial Intelligence Audio

Sesame

Conversational speech model that achieves voice presence

Lifestack (Web + AI Scheduler)

CustomGPT.ai Researcher