Why is the Microsoft Speech Recognition SemanticValue.Confidence value always 1?

2023-02-19 04:08 问答作者：

I'm trying to use the SpeechRecognizer with a custom Grammar to handle the following pattern:

"Can you open {item}?" where {item} uses Dicta开发者_开发知识库tionGrammar.

I'm using the speech engine built into Vista and .NET 4.0.

I would like to be able to get the confidences for the SemanticValues returned. See example below.

If I simply use "recognizer.AddGrammar( new DictationGrammar() )", I can browse through e.Results.Alternates and view the confidence values of each alternate. That works if DictationGrammar is at the top level.

Made up example:

Can you open Firefox? .95
Can you open Fairfax? .93
Can you open file fax? .72
Can you pen Firefox? .85
Can you pin Fairfax? .63

But if I build a grammar that looks for "Can you open {semanticValue Key='item' GrammarBuilder=new DictationGrammar()}?", then I get this:

Can you open Firefox? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you open Fairfax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you open file fax? .91 - Semantics = {GrammarBuilder.Name = "can you open"}
Can you pen Firefox? .85 - Semantics = null
Can you pin Fairfax? .63 - Semantics = null

The .91 shows me that how confident it is that it matched the pattern of "Can you open {item}?" but doesn't distinguish any further.

However, if I then look at e.Result.Alternates.Semantics.Where( s => s.Key == "item" ), and view their Confidence, I get this:

Firefox 1.0
Fairfax 1.0
file fax 1.0

Which doesn't help me much.

What I really want is something like this when I view the Confidence of the matching SemanticValues:

Firefox .95
Fairfax .93
file fax .85

It seems like it should work that way...

Am I doing something wrong? Is there even a way to do that within the Speech framework?

I'm hoping there's some inbuilt mechanism so that I can do it the "right" way.

As for another approach that will probably work...

Use the SemanticValue approach to match on the pattern
For anything that matches on that pattern, extract the raw Audio for {item} (use RecognitionResult.Words and RecognitionResult.GetAudioForWordRange)
Run the raw audio for {item} through a SpeechRecognizer with the DictationGrammar to get the Confidence

... but that's more processing than I really want to do.

I think a dictation grammar only does transcription. It does speech to text without extracting semantic meaning because by definition a dictation grammar supports all words and doesn't have any clues to your specific semantic mapping. You need to use a custom grammar to extract semantic meaning. If you supply an SRGS grammar or build one in code or with SpeechServer tools, you can specify Semantic mappings for certain words and phrases. Then the recognizer can extract semantic meaning and give you a semantic confidence.

You should be able to get Confidence value from the recognizer on the recognition, try System.Speech.Recognition.RecognitionResult.Confidence.

The help file that comes with the Microsoft Server Speech Platform 10.2 SDK has more details. (this is the Microsoft.Speech API for Server applications which is very similar to the System.Speech API for client applications) See (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4.) or the Microsoft.Speech documentation at http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.semanticvalue(v=office.13).aspx

For SemanticValue Class it says:

All Speech platform-based recognition engines output provide valid instances of SemanticValue for all recognized output, even phrases with no explicit semantic structure.

The SemanticValue instance for a phrase is obtained using the Semantics property on the RecognizedPhrase object (or objects which inherit from it, such as RecognitionResult).

SemanticValue objects obtained for recognized phrases without semantic structure are characterized by:

Having no children (Count is 0)

The Value property is null.

An artificial confidence level of 1.0 (returned by Confidence)

Typically, applications create instance of SemanticValue indirectly, adding them to Grammar objects by using SemanticResultValue, and SemanticResultKey instances in conjunction with, Choices and GrammarBuilder objects.

Direct construction of an SemanticValue is useful during the creation of strongly typed grammars

When you use the SemanticValue features in the grammar you are typically trying to map different phrases to a single meaning. In your case the phrase "I.E" or "Internet Explorer" should both map to the same semantic meaning. You set up choices in your grammar to understand each phrase that can map to a specific meaning. Here is a simple Winform example:

private void btnTest_Click(object sender, EventArgs e)
{
    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();

    Grammar testGrammar = CreateTestGrammar();  
    myRecognizer.LoadGrammar(testGrammar);

    // use microphone
    try
    {
        myRecognizer.SetInputToDefaultAudioDevice();
        WriteTextOuput("");
        RecognitionResult result = myRecognizer.Recognize();              

        string item = null;
        float confidence = 0.0F;
        if (result.Semantics.ContainsKey("item"))
        {
            item = result.Semantics["item"].Value.ToString();
            confidence = result.Semantics["item"].Confidence;
            WriteTextOuput(String.Format("Item is '{0}' with confidence {1}.", item, confidence));
        }

    }
    catch (InvalidOperationException exception)
    {
        WriteTextOuput(String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message));
        myRecognizer.UnloadAllGrammars();
    }

}

private Grammar CreateTestGrammar()
{                        
    // item
    Choices item = new Choices();
    SemanticResultValue itemSRV;
    itemSRV = new SemanticResultValue("I E", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("explorer", "explorer");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("firefox", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("mozilla", "firefox");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("chrome", "chrome");
    item.Add(itemSRV);
    itemSRV = new SemanticResultValue("google chrome", "chrome");
    item.Add(itemSRV);
    SemanticResultKey itemSemKey = new SemanticResultKey("item", item);

    //build the permutations of choices...
    GrammarBuilder gb = new GrammarBuilder();
    gb.Append(itemSemKey);

    //now build the complete pattern...
    GrammarBuilder itemRequest = new GrammarBuilder();
    //pre-amble "[I'd like] a"
    itemRequest.Append(new Choices("Can you open", "Open", "Please open"));

    itemRequest.Append(gb, 0, 1);

    Grammar TestGrammar = new Grammar(itemRequest);
    return TestGrammar;
}

继续阅读：.net speech speech-recognition

Why is the Microsoft Speech Recognition SemanticValue.Confidence value always 1?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Best solution for private video database [closed]

国内夏季避暑旅游胜地有哪些？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?