How to go about making an untrained speech to text converter?
I have a severe to profound deafness from a very early age but luckily I can speak like a normal person. Verbal communication has always been difficult for me due to my impaired speech recognition abilities even with lip-reading. I have gone through school and college by just reading boards, powerpoint slides, books and the internet. I am doing pretty much fine at my current software engineering job, but of late I feel that I must put some effort to make my situation better.
Subtitles are my lifesaver in this country to understand movies/shows on TV and I have only been enjoying this for the last 7 years (I am 31 now).
I strongly feel the need for the ability to see subtitles in real life whenever I talk to some person, even strangers. I want to develop an untrained speech to text converter, and as a start it does not even have to spell out exact words for me, only cues on syllables/phonetics will also be fine.
I have googled on this for a while, but most results are either text to speech or half-baked attempts on speech recognition to give voice commands to a computer. I would really like to get some pointers on how to start on this project. Specifically I need steps like how to deal with audio files and what kind of processing I have to do to get approx phonetics as fast 开发者_Go百科as possible.
You might want to look at CMU's Sphinx project which does speech to text in real time. They have some demos to try it out.
Have a look at the DSP guide, it's more about low-level stuff but techniques like Fourier transforms and filtering are of great importance to audio processing. Even if you don't start from scratch it can be good to appreciate the principles and applications.
That said, I bet that starting from scratch, one could create something that can tell apart a basic set of sounds with a few days' work...
Here's some other questions that might give you ideas:
- Transcribing WMA/MP3 audio in an automated fashion?
- How do I convert text to speech?
And take a look at SIL Linguistics Computing.
Good luck.
精彩评论