开发者

Getting the amplitude(or rms voltage) of audio signal captured in C++ by wavin lib.?

I am working on a very basic robotics project, and wish to implement voice recognition in it. i know its a complex thing but i wish to do it for only 3 or 4 commands(or words).

i know that using wavin i can record audio. but i wish to do real-time amplitude analysis on the audio signal, how can that be done, the wave will be inputed as 8-bit, mono.

i have thought of divinding the signal into a set of some specific time, further diving it into smaller subsets, getting the average rms value over the subset and then summing them up and then see how much different they are from the actual stored signal.If the error is below accepted value for all(or most) of the sets, then print the word.

How can this be implemented? if you can provide me any other suggestion also, it 开发者_C百科would be great.

Thanks, in advance.


There is no simple way to recognize words, because they are basically a sequence of phonemes which can vary in time and frequency.

Classical isolated word recognition systems use signal MFCC (cepstral coefficients) as input data, and try to recognize patterns using HMM (hidden markov models) or DTW (dynamic time warping) algorithms.

You will also need a silence detection module if you don't want a record button.

For instance Edimburgh University toolkit provides some of these tools (with good documentation).

If you don't want to build it "from scratch" or have a source of inspiration, here is an (old but free) implementation of such a system (which uses its own toolkit) with a full explanation and practical examples on how it works.

This system is a LVCSR (Large-Vocabulary Continuous Speech Recognition) and you only need a subset of it. If someone know an open source reduced vocabulary system (like a simple IVR) it would be welcome.

If you want to make a basic system from your own, I recommend you to use MFCC and DTW:

  • For each target word to modelize:
    • record some instances of the word
    • compute some (eg each 10ms) delta-MFCC through the word to have a model
  • When you want to recognize a signal:
    • compute some delta-MFCC of this signal
    • use DTW to compare these delta-MFCC to each modelized word's delta-MFCC
    • output the word that fits the best (use a threshold to drop garbage)


If you just want to recognize a few commands, there are many commercial and free products you can use. See Need text to speech and speech recognition tools for Linux or What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? or Speech Recognition on iPhone. The answers to these questions link to many available products and tools. Speech recognition and understanding of a list of commands is a very common problem solved commercially. Many of the voice automated phone systems you call uses this type of technology. The same technology is available for developers.

From watching these questions for few months, I've seen most developer choices break down like this:

  • Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx

  • Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/

  • Commercial products - Nuance, Loquendo, AT&T, others

  • Online service - Nuance, Yapme, others

Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜