skip to Main Content

I want to use Azure’s Speech Service to send speech files to translate.

Azure has examples on how to send a File or a Stream to it’s Speech Service. But I want to be able to send a Byte Array.

But I can’t figure out how to do it.

On this page it shows you how to SEND both a Stream and a File:

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-recognize-speech?pivots=programming-language-csharp

The Stream example does NOT work for me.
The File example does work for me (so this proves that it’s not my audio file that is the problem).

I’ll put examples below of the things I’ve tried.

I am able to get this example working that sends a File:

private static string _speechKey = "your_key";
private static string _speechRegion = "your_region";
private static string _filePath = "PathToFile.wav";

public async Task FromFile()
{
    var speechConfig = SpeechConfig.FromSubscription(_speechKey, _speechRegion);

    using var audioConfig = AudioConfig.FromWavFileInput(_filePath);
    using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

    var result = await speechRecognizer.RecognizeOnceAsync();
    OutputSpeechRecognitionResult(result);
}


private static void OutputSpeechRecognitionResult(SpeechRecognitionResult speechRecognitionResult)
{
switch (speechRecognitionResult.Reason)
{
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"RECOGNIZED: Text={speechRecognitionResult.Text}");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(speechRecognitionResult);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
        }
        break;
    }
}

But I cannot get this example working (from Azure), that sends a Stream:

It returns "ResultReason.NoMatch" (in it’s result object) which means it cannot recognize the text. (even though it was able to do so with the SAME audio file using the example above"

public async Task FromStream()
{
var speechConfig = SpeechConfig.FromSubscription(_speechKey, _speechRegion);

var reader = new BinaryReader(File.OpenRead(_filePath));
using var audioConfigStream = AudioInputStream.CreatePushStream();
using var audioConfig = AudioConfig.FromStreamInput(audioConfigStream);
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

byte[] readBytes;
do
{
    readBytes = reader.ReadBytes(1024);
    audioConfigStream.Write(readBytes, readBytes.Length);
} while (readBytes.Length > 0);

var result = await speechRecognizer.RecognizeOnceAsync();
OutputSpeechRecognitionResult(result);
}

I got the above example from here (the 3rd example down):

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-recognize-speech?pivots=programming-language-csharp

But more importantly, I need to send a Byte Array to Azure.
I have tried the following methods but none of them work:

I tried converting Byte Array to Stream first, and send the Stream to Azure:

public async Task FromByteArray1()
{
var speechConfig = SpeechConfig.FromSubscription(_speechKey, _speechRegion);

byte[] byteArray = File.ReadAllBytes(_filePath);
Stream stream = new MemoryStream(byteArray);

var reader = new BinaryReader(stream);
using var audioConfigStream = AudioInputStream.CreatePushStream();
using var audioConfig = AudioConfig.FromStreamInput(audioConfigStream);
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

byte[] readBytes;
do
{
    readBytes = reader.ReadBytes(1024);
    audioConfigStream.Write(readBytes, readBytes.Length);
} while (readBytes.Length > 0);

var result = await speechRecognizer.RecognizeOnceAsync();
OutputSpeechRecognitionResult(result);
}

I tried this method of sending a Byte Array… this example I got from this page:
https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/91

public async Task FromByteArray2()
{
var speechConfig = SpeechConfig.FromSubscription(_speechKey, _speechRegion);

byte[] byteArray = File.ReadAllBytes(_filePath);

using var pushStream = AudioInputStream.CreatePushStream();
pushStream.Write(byteArray);
//pushStream.Close();
AudioConfig audioConfig = AudioConfig.FromStreamInput(pushStream);
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
pushStream.Close();

var result = await speechRecognizer.RecognizeOnceAsync();

OutputSpeechRecognitionResult(result);
}

I tried this method of sending a Byte Array, I came up with myself.

public async Task FromByteArray3()
{
var speechConfig = SpeechConfig.FromSubscription(_speechKey, _speechRegion);

byte[] byteArray = File.ReadAllBytes(_filePath);

using var audioConfigStream = AudioInputStream.CreatePushStream();
using var audioConfig = AudioConfig.FromStreamInput(audioConfigStream);
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

foreach (var item in byteArray)
{
    audioConfigStream.Write(byteArray, 1);
}

var result = await speechRecognizer.RecognizeOnceAsync();
OutputSpeechRecognitionResult(result);
}

I tried this method of sending a Byte Array I got from Azure’s page:

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-conversation-transcription?pivots=programming-language-csharp

And it doesn’t work, it returns this json:
jsonData: Text={
"Status": "Unsupported Audio Format",
"Signature": null,
"Transcription": null
}

public async Task FromByteArray4()
{
byte[] fileBytes = File.ReadAllBytes(_filePath);
var content = new ByteArrayContent(fileBytes);
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", _speechKey);
var response = await client.PostAsync($"https://signature.{_speechRegion}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromByteArray", content);

var jsonData = await response.Content.ReadAsStringAsync();
Console.WriteLine($"JSON Data: Text={jsonData}");
}

None of the examples above worked… either I got a null response, or I got a response that says it could NOT recognize the text.

The File example did work for me, so this proves it is not my test audio file that is the problem… it can recognize the text when I send it as a File…. just not as a Stream or Byte Array.

I got my speech audio files to test with here:

https://www.pacdv.com/sounds/voices-4.html

2

Answers


  1. With the samples we use for testing the approach that you took from the sample code in the documentation works for me. Can you point out which specific file you are experiencing issues with (from the link you provided)? I suspect that it could be a header issue with the WAV file, but I wanted to make sure that’s the case.

    Login or Signup to reply.
  2. The default audio format for streams in Speech SDK is 16kHz, 16-bit, mono.

    I checked one of the wave files from https://www.pacdv.com/sounds/voices-4.html
    and the format there was: WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz

    In order to use streaming with that input, you need to configure push stream with that format, see C# example below

    var pushStream = AudioInputStream.CreatePushStream(AudioStreamFormat.GetWaveFormatPCM(44100, 16, 2)
    

    Or you could alternatively convert the file(s) to 16kHz, 1 channel and use default constructor.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search