skip to Main Content

I am using "microsoft-cognitiveservices-speech-sdk": "^1.32.0" library for STT feature.
I want to detect if user talks again within 2-3 seconds after previous STT finished.
Right now I am using recognizeOnceAsync function.
It seems that it is only possible to get STT result after the voice ended, but not possible to detect beginning of voice.
Is there any way to do this ? Or a roundabout way?

2

Answers


  1. To handle silence and configure timeout you can check Segmentation silence timeout property that can help in make results longer and allow longer pauses from the speaker within a phrase.

    The default value is 500 and can be adjust in range of 100 to 5000 in milliseconds.

    speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000");
    

    For more details regarding silence handling, you can check this documentation.

    An alternative approach could be to utilize continuous recognition instead of single-shot recognition, which provides more control over when to cease recognition. For more information and examples, please refer to the provided documentation.

    Login or Signup to reply.
  2. Based on package name, it looks like you are using CognitiveServices Speech SDK for JavaScript.
    The earliest information about recognized speech you can get by checking the intermediate results (recognizing event), see the following sample how to use the recognizing event in JavaScript.
    https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/node/speech.js

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search