A little while back I used this combined with a physics simulator to make a toy where you throw polygons in the air and they scream: https://ohgodwhathaveidone.stackblitz.io/
Code's here if anyone wants to play: https://stackblitz.com/edit/ohgodwhathaveidone – I did a fairly medium job of abstracting the synthesis engine away from the UI, but it might be a decent starting point if you're looking to make other Trombone-based web silliness.
This reminds me of the Sprechmaschine [1] ("speaking machine") built at the end of the 18th century by Wolfgang von Kempelen (the guy who build the original mechanical turk [2]). Here is a YouTube video showing it in action (for example, the machine says "Mama" around 1:14): https://www.youtube.com/watch?v=k_YUB_S6Gpo
[1] https://de.wikipedia.org/wiki/Wolfgang_von_Kempelen#Die_Spre...
Literate C version: https://pbat.ch/proj/voc/
Literate depot: https://github.com/PaulBatchelor/voc
Actually compiled source is there: https://github.com/PaulBatchelor/Soundpipe/blob/master/modul...
Lest we forget that speech synthesis is not just for grotesque but amusing semi-real vocal synths like this, here's a BBC Radio 4 history of speech synthesis as an assistive technology - Klatt's Last Tapes, by Stephen Hawking's daughter, Lucy:
Man this brings back memories. We used this program for my linguistics homework in college. In 1996. Although I think it was an app then not a web page.
This would be useful to demonstrate the difference between p/f and l/r for those brought up without those distinctions.
I'd also (as an English speaker) like to see/hear Dutch g and Xhosan clicks.
Would it be possible to use Reinforcement Learning + Speech Recognition to turn this thing into a real voice synthesizer?
Reminds me of Xiph's Speex/CELP model of speech as a mix of noise and frequency to achieve high compression, requiring as little as 2.15 kilobits (275 bytes) per second. It sounds perceptibly similar to the original recording, even though the difference between the input and output sampled data may be high:
https://www.speex.org/docs/manual/speex-manual/node9.html
Bitrate comparison:
https://www.speex.org/comparison/
Samples:
https://www.speex.org/samples/
Maybe higher compression can be achieved with better prediction, aka machine learning.
I've actually been looking for the opposite of this (i.e. sound in, mouth representation out) for a while. Does anyone know of such a thing?
Wow! I was able to successfully recreate all sorts of letters and sounds just by imagining how my own mouth works, and then manipulating the different components on the pink trombone in the same way. I'm impressed!
Pondering what makes this sound "male".
Sudden sound warning.
This is amazing. I can get it to make almost any speech sound, but one I can't get is [s], because the model lacks teeth!
I feel like I am simulating orgasmic responses!
This was also ported to C++ inside a modular synth:
https://www.youtube.com/watch?v=PDn7ygnJUfI
https://www.youtube.com/watch?v=3jcqKnIa8T4
What someone needs to do is put sensors in people's mouths, record them saying known phrases, then stick the sensor data+phrases into some AI and see if we can't get that Trombone talkin'!
Previous post in Apr 2017: https://news.ycombinator.com/item?id=14135658
The shape of the tongue control reminds me a lot of the rhombus in https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...
... which is because the latter was patterned after the shape of the mouth.
I feel embarrassed playing with this
Thats nice and everything but until I have a hardware version where I can just switch a button on and do this and that, I'm not really going to be truly satisfied. Please hardware'ify.
how do i use this to answer the phone?
Is the author aware of the other meaning of "pink trombone"?[1]
[1] https://www.urbandictionary.com/define.php?term=pink%20tromb...
In a similar vein: http://www.adultswim.com/etcetera/choir/
The most interesting thing about this one is the chord progressions it generates.