The inital demo went absolutely viral and it seems to me that a ton of use cases could be unlocked by true speech-to-speech AI models. But while AI companies are fighting hard on text, coding, video and image generation, I have yet to see someone working on this. It's all just speech-to-text-to-speech with major downsides.
Why is that?
maybe you can try this? https://www.sesame.com/research/crossing_the_uncanny_valley_...