I am trying to create a simple proof of concept speech transcribing program using Azure. I have set up all the stuff in Azure and tested with a simple program based on the docs:
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
class Program
{
static SpeechRecognizer recognizer;
async static Task FromMic(SpeechConfig speechConfig)
{
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
recognizer = new SpeechRecognizer(speechConfig, audioConfig);
var stopRecognition = new TaskCompletionSource<int>();
recognizer.SessionStarted += (s, e) =>
{
Console.WriteLine("n Session started event: " + e);
};
recognizer.Recognizing += (s, e) =>
{
Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
};
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
}
stopRecognition.TrySetResult(0);
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine("n Session stopped event.");
stopRecognition.TrySetResult(0);
};
await recognizer.StartContinuousRecognitionAsync();
// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
}
async static Task Main(string[] args)
{
var speechConfig = SpeechConfig.FromSubscription("xxxxxxxxxxxxxxxxxxxx", "xxxx");
// Clearly I don't really know how to stop the recognition properly before exiting, but that's an issue for another day...
AppDomain.CurrentDomain.ProcessExit += delegate
{
EndRecognition();
};
Console.CancelKeyPress += delegate
{
EndRecognition();
};
await FromMic(speechConfig);
Console.WriteLine("Exiting");
}
static void EndRecognition()
{
Console.WriteLine("Ending recognition...");
recognizer.StopContinuousRecognitionAsync();
recognizer.Dispose();
Console.WriteLine("Done.");
}
}
The program works fine on my personal machine at home. When I try the same thing on a work computer, I get the session started message, but nothing else (no speech recognition).
My organization routes all traffic through a proxy and of course has less permissive firewall rules than my home machine/network, so I have tried:
- making sure the mic is working/connected
- setting HTTP_PROXY and HTTPS_PROXY environment variables to my organization’s proxy
- viewing the AV firewall logs (doesn’t seem to show anything, but perhaps that’s because I’m not an admin or something)
- viewing the "total calls" metric chart on Azure to see if anything is actually happening on the Azure side (nothing is)
I would have expected an exception to be thrown if the program can’t connect to the Azure VM hosting the speech resource (not sure if using correct terminology, still new to this), but apparently not; something seems to be silently failing.
What would be the next thing to try/check as the next troubleshooting step here?
Note: as stated above, this is proof of concept/experiment kind of thing for a demo or two; long term I don’t plan to connect to a personal cloud service on a corporate network.
2
Answers
I have to tried to reproduce the issue by using the visual studio 2019 and created a sample speech resource in azure and copied the key and region of that resource with your code sample and able to get the output from the speech recognizer as shown in the below screenshot:
I have installed cognitive services speech package through the nuget package manager as shown below:
Initially got the error message as shown in below image:
After passing the
key
andregion
of the azure speech resource in the code and after rebuilding in VS got the result below:Output:
Refer this Microsoft document to Configure virtual networks and the Speech resource networking settings and also use Speech service through a private endpoint.
In addition,
SpeechConfig
has aSetProxy
method that can be used to establish proxy details before creating a recognizer — if you have a known proxy within an enterprise network, this method may allow traffic to do the right thing.https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechconfig.setproxy?view=azure-dotnet#microsoft-cognitiveservices-speech-speechconfig-setproxy(system-string-system-int32)
Some customers have corporate networks with more restrictive "allow-based" networking lists, and in those situations proxy configuration won’t make a difference (coordination with the network administrators to add the appropriate hosts to that allow list are needed). But if it’s just a matter of providing a host/port/username/password, using the
SpeechConfig
per the above should help.