skip to Main Content

I’m using azure speech to text to find timestamps of utterances in a wav file.

The problem I’m encountering is that if the user has recorded numbers, for instance "I’m going to count to three. One, two, three, here I come". The numbers are omitted from the output. This happens both for English and other languages.
I can understand utterances like ‘eh’ and ‘ah’ being omitted, but numbers? why is that the default.

I’m using:

  • speechConfig.OutputFormat = OutputFormat.Detailed;
  • the default language model.

Can I somehow configure the SpeechRecognizer differently so it also outputs numbers?

2

Answers


  1. Chosen as BEST ANSWER

    I found the reason my results did not recognizing numbers. It was in my own code. In my postprocessing I was trying to get rid of punctuation marks from the result. Here I was also accidently getting rid of numbers.


    • So, using the following code I was able to convert a .wav audio file to text without the loss of data.
     string speechKey = "<Your_Key>";
     string speechRegion = "Your_Region";
     
     var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
            
    speechConfig.SpeechRecognitionLanguage = "en-US";
    
    using var audioConfig = AudioConfig.FromWavFileInput("<Path to File>");
    
    using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
    
            
    var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
           
    Console.WriteLine(speechRecognitionResult.Text);
    

    output:
    enter image description here

    • But apparently there is a bug in the conversion model where if there is a pause betweenI'm going to count to three. and One, two, three, here I come . The model will omit the One, two, three, here I come sentence from the audio file.

    • Also, I couldn’t find anything in this MSDOC on audio config class to configure the audio settings regarding this issue.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search