开发者

.Net System.Speech Problems encountered when change from Mic-input to WavFile-input?

I'm using C# .net library System.Speech to implement my ASR app ( BTW, I've seen a post mentioned the SpeechLib.dll, which seems to be a more basic and low-level implementation of the SAPI, are they the same?). Our main purpose is to implement as the Server/开发者_Python百科Client ASR system : to record user's voice on the client, and transfer the whole audio stream to the server via internet, and the sever process the ASR job and return the result to the client.

And I've written a similar app, which is using the local mic as the voice input and it performed pretty well.

my origin app:


SpeechRecgonitionEngine sr = new  SpeechRecgonitionEngine();

sr.SetInputToDefaultDevice();

sr.RecognizeAsync();

In this way, I used the mic for input, and the accuracy of the result show pretty good.

And here's the problem. Now turn to the new task, which I have to set the recognition input to a WavFile(or a audioStream via the TCP/IP socket connection). So I just simply changed my code to this way:


SpeechRecgonitionEngine sr = new  SpeechRecgonitionEngine();

sr.SetInputToWaveFile(@"D:\input.wav");

sr.RecognizeAsync();

the result turn to be unsatisfactory. I just pre-record some wave snippets to several files seperately, base on the same grammar of the mic-input app, and set these files as the ASR input. However, only some files can be detected(handled by SpeechDectectedEvent), and very few files can be well recognized(handled by SpeechRecognizedEvent). I just record the same phrase as to the mic-input app.

Despite for the poor accuracy, some files can be recognized correctly which indicates my code don't have any logic error. But I assumed that I miss some job before i use it, such as setup some parameters of the recognizer.

So I'm here to ask for help, if anyone know the reason of the poor accuracy using wavfile-input?

Thanks!!!!


SpeechLib.dll is the COM interop library for the native COM interface (SAPI). SpeechRecognitionEngine is the friendly .NET class wrapper for it. They both access the exact same recognition engine.

There's probably some kind of problem with your recording. Usually a volume issue, like clipping (too loud) or too much noise (too soft). Get some basic diagnostics by implementing the AudioSignalProblemOccurred event.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜