Azure Cognitive Speech-to-text DetailedSpeechRecognitionResult is not detecting explicit punctuation

BabarAli
September 6, 2022
202 views
0 votes
2 Answers

when I using confidence the punctuation is not working just like I am saying question mark it was typing question mark instant ? and when I say period it was typing period instant . I have make a checkbox when you click on the checkbox the punctuation will be on

SpeechConfig config = SpeechConfig.FromSubscription("key", "region");
config.OutputFormat = OutputFormat.Detailed;
if (Properties.Settings.Default.Punctuation)
{
    config.SetServiceProperty("punctuation", "explicit", ServicePropertyChannel.UriQueryParameter);
}
recognizer = new SpeechRecognizer(config);
recognizer. Recognizer. Recognizedecognizer_Recognized;
 
...

private void SpeechRecognizer_Recognized(object sender, SpeechRecognitionEventArgs e)
{
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        if (e.Result.Text.ToLower().Equals("new line") || e.Result.Text.ToLower().Equals("newline"))
        {
            SendKeys.SendWait(Environment.NewLine);
        }
        else
        {
            var detailedResults = e.Result.Best();
            if (detailedResults != null && detailedResults.Any())
            {
               
                var bestResults = detailedResults?.ToList()[0];
                foreach (var word in bestResults.Words)
                {
                    double per = word.Confidence * 100;
                    SendKeys.SendWait($"{word.Word} [{per:0.##}] ");
                }

            }
        }
    }
}

Tags: azure azure-cognitive-services c#speech-recognition winforms

Answers

- ChrisSchaller
- September 6, 2022 at 9:52 am
- 0 votes
0
Using cognitive services I cannot reproduce your issue. Setting the config.OutputFormat = OutputFormat.Detailed or config.RequestWordLevelTimestamps(); does not affect the explicit punctuation recognition.

What is not clear from your example is the current state of your setting. When in doubt, if we are toggling logic using settings, and the behaviour that we observe is the same even when we change the setting values then the obvious code to check is the setting value itself.

Please try to comment out your logic to toggle the punctuation like this:
```
//if (Properties.Settings.Default.Punctuation)
{
    config.SetServiceProperty("punctuation", "explicit", ServicePropertyChannel.UriQueryParameter);
}
```
If this solves it then there are two considerations:
1. What is the initial state of the Properties.Settings.Default.Punctuation setting? Is your application logic not updating the value when you expect it to? Any mutating logic that affects that setting may need to call Properties.Settings.Default.Save() to save changes. An extension of this of course is that depending on where your mutating logic is executing from, you might need to call Properties.Settings.Default.Reload() to ensure that the current values are loaded from the store, however this is not usually required if you are operating in the same thread space, which you most likely will be in WinForms.
2. Is the config loaded once, and is that once before the setting value has been toggled? That step in the workflow is unclear from your description and the code example. If you are using continuous recognition or you are creating a single instances of SpeechRecognizer for the lifetime of your Form then changes to your setting will not be applied into the Speech Configuration.
  
  You will need to re-initialize the SpeechRecognizer as part of your logic that is handling the setting changed event or have some other routine in the speech event handlers that detects a change in this setting and restarts the SpeechRecognizer connection and process.
Login or Signup to reply.

What you are observing is by design. In most circumstances it not necessary or even helpful to inspect the details of recognized speech result. It looks like you have misinterpreted how to use the details.

You don’t realise it but your example of detecting "new line" or "newline" as a key phrase and interpreting that as a request to inject a line feed into the output is the very same process at work.

For puntuation to be detected in the speech, the first thing that the classifier must do is resolve the words. It is only after the word has been resolved that the service can post process the results to classify the word as a natural word or punctuation.

The process is a bit like this:

Detected the word "comma" with high confidence
If the punctuation setting is set to explicit, then Is the word on its own or at the end of a recognized sequence that was followed by a pause
If yes, then interpret it as "," and not "comma"

For this reason it is important to understand that when the punctuation setting is set to explicit, the punctuation must be isolated out of the normal sentence cadence of the spoken text.

Read this as a sentence with a constant pace without punctuation:

this is a sentence that doesn’t have a comma or a full stop but an exclamation mark would look nice

If you read fast and fluent enough, there should be no punctation in the output, even if the words were recognized with high confidence. To get punctuation into the same text, you actually need to read this script:

This is a sentence that doesn’t have a comma.
Comma.
Or a fullstop.
Comma.
But an exclamation mark would look nice.
exclamation mark.

 This is a sentence that doesn't have a comma , or a full stop , but an exclamation mark would look nice !

The per-word analysis for my test looks like this:

word	confidence
this	85.99%
is	95.93%
a	68.49%
sentence	96.99%
that	90.03%
doesn’t	96.75%
have	94.57%
a	87.88%
comma	94.58%
comma	94.34%
or	67.14%
a	64.68%
fullstop	77.63%
comma	94.90%
but	91.17%
an	62.65%
exclamation	98.44%
mark	68.58%
would	86.15%
look	91.58%
nice	97.40%
exclamation	97.05%
mark	96.61%

Notice that the words representing the punctuation all have a high confidence rating, but in the output not all of the words were actually interpreted as punctuation. This might be clearer in this screenshot where I have highlighted two commas that are in the output, but are correctly identified as words:

In this screenshot, the panel on the left is populated with e.Result.Text and the panel on the right with the Word and Confidence.

DetailedSpeechRecognitionResult.Words
Returns the Word level timing result list.

The Words list is designed to be used to map the recognised word back to a specific offset and duration in the audio file that was submitted for analysis. You would use this information when testing and training the model or if you wanted to display the text as sub-titles for an audio or video clip. Punctuation is not shown at this level, it is purely about timing only, all it has done is literally transcribed the spoken audio into English vocabulary. It is the responsibility of other analytical functions to use this information to determine which detected words might represent punctuation or to determine context or sentiment.

FWIW this is my Recognized event handler:

recognizer.Recognized += (s, e) =>
{
    // Checks result.
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
        string text = e.Result.Text;
        if (e.Result.Text.ToLower().Equals("new line") || e.Result.Text.ToLower().Equals("newline"))
            text = Environment.NewLine;

        // update the left textbox
        this.BeginInvoke(SetText, textBox1, text); 

        var detailedResults = e.Result.Best();
        if (detailedResults != null && detailedResults.Any())
        {
            var bestResults = detailedResults?.ToList()[0];
            foreach (var word in bestResults.Words)
            {
                double perc = word.Confidence * 100;
                // update the right textbox
                this.BeginInvoke(SetText, textBox2, $"{word.Word} [{word.Confidence:p2}] " + Environment.NewLine);
            }
        }
    }
    else if (e.Result.Reason == ResultReason.NoMatch)
    {
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
    }
};
...
delegate void SetTextDelegate(TextBox textBox, string text);
private SetTextDelegate SetText = delegate (TextBox textbox, string text)
{
    textbox. Text += " " + text;
};

Please signup or login to give your own answer.

Click here to cancel reply.