when I using confidence the punctuation is not working just like I am saying question mark it was typing question mark instant ? and when I say period it was typing period instant . I have make a checkbox when you click on the checkbox the punctuation will be on
SpeechConfig config = SpeechConfig.FromSubscription("key", "region");
config.OutputFormat = OutputFormat.Detailed;
if (Properties.Settings.Default.Punctuation)
{
config.SetServiceProperty("punctuation", "explicit", ServicePropertyChannel.UriQueryParameter);
}
recognizer = new SpeechRecognizer(config);
recognizer. Recognizer. Recognizedecognizer_Recognized;
...
private void SpeechRecognizer_Recognized(object sender, SpeechRecognitionEventArgs e)
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
if (e.Result.Text.ToLower().Equals("new line") || e.Result.Text.ToLower().Equals("newline"))
{
SendKeys.SendWait(Environment.NewLine);
}
else
{
var detailedResults = e.Result.Best();
if (detailedResults != null && detailedResults.Any())
{
var bestResults = detailedResults?.ToList()[0];
foreach (var word in bestResults.Words)
{
double per = word.Confidence * 100;
SendKeys.SendWait($"{word.Word} [{per:0.##}] ");
}
}
}
}
}
2
Answers
Using cognitive services I cannot reproduce your issue. Setting the
config.OutputFormat = OutputFormat.Detailed
orconfig.RequestWordLevelTimestamps();
does not affect the explicit punctuation recognition.What is not clear from your example is the current state of your setting. When in doubt, if we are toggling logic using settings, and the behaviour that we observe is the same even when we change the setting values then the obvious code to check is the setting value itself.
Please try to comment out your logic to toggle the punctuation like this:
If this solves it then there are two considerations:
What is the initial state of the
Properties.Settings.Default.Punctuation
setting? Is your application logic not updating the value when you expect it to? Any mutating logic that affects that setting may need to callProperties.Settings.Default.Save()
to save changes. An extension of this of course is that depending on where your mutating logic is executing from, you might need to callProperties.Settings.Default.Reload()
to ensure that the current values are loaded from the store, however this is not usually required if you are operating in the same thread space, which you most likely will be in WinForms.Is the config loaded once, and is that once before the setting value has been toggled? That step in the workflow is unclear from your description and the code example. If you are using continuous recognition or you are creating a single instances of
SpeechRecognizer
for the lifetime of your Form then changes to your setting will not be applied into the Speech Configuration.You will need to re-initialize the
SpeechRecognizer
as part of your logic that is handling the setting changed event or have some other routine in the speech event handlers that detects a change in this setting and restarts theSpeechRecognizer
connection and process.What you are observing is by design. In most circumstances it not necessary or even helpful to inspect the details of recognized speech result. It looks like you have misinterpreted how to use the details.
You don’t realise it but your example of detecting
"new line"
or"newline"
as a key phrase and interpreting that as a request to inject a line feed into the output is the very same process at work.For puntuation to be detected in the speech, the first thing that the classifier must do is resolve the words. It is only after the word has been resolved that the service can post process the results to classify the word as a natural word or punctuation.
The process is a bit like this:
punctuation
setting is set toexplicit
, then Is the word on its own or at the end of a recognized sequence that was followed by a pause","
and not"comma"
For this reason it is important to understand that when the
punctuation
setting is set toexplicit
, the punctuation must be isolated out of the normal sentence cadence of the spoken text.Read this as a sentence with a constant pace without punctuation:
If you read fast and fluent enough, there should be no punctation in the output, even if the words were recognized with high confidence. To get punctuation into the same text, you actually need to read this script:
The per-word analysis for my test looks like this:
Notice that the words representing the punctuation all have a high confidence rating, but in the output not all of the words were actually interpreted as punctuation. This might be clearer in this screenshot where I have highlighted two commas that are in the output, but are correctly identified as words:
In this screenshot, the panel on the left is populated with
e.Result.Text
and the panel on the right with the Word and Confidence.The
Words
list is designed to be used to map the recognised word back to a specific offset and duration in the audio file that was submitted for analysis. You would use this information when testing and training the model or if you wanted to display the text as sub-titles for an audio or video clip. Punctuation is not shown at this level, it is purely about timing only, all it has done is literally transcribed the spoken audio into English vocabulary. It is the responsibility of other analytical functions to use this information to determine which detected words might represent punctuation or to determine context or sentiment.FWIW this is my
Recognized
event handler: