I am using the Azure SpeechSynthesizer libraries in python. I have written the code that will translate some text into speech. I am finding that you need to make a get() call on the result to actually have it do any speech synthesis. But this get() call is essentially blocking.
pull_stream = speechsdk.audio.PullAudioOutputStream()
stream_config = speechsdk.audio.AudioOutputConfig(stream=pull_stream)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=stream_config)
result = speech_synthesizer.speak_text_async(text)
result.get()
del speech_synthesizer
If I don’t call result.get(), I am unable to pull any data from the stream. But when I call result.get(), it blocks for several seconds while it translates the text to speech. I have run this with an AudioOutputConfig of filename to have it just save to a wave file, and the timing is about the same. So I know it is doing the same work regardless of whether I get the output as a stream or a file.
Any pointers on how to get this to actually work asynchronously so I can pull from the stream as it is translating, and not have to wait until it completes?
2
Answers
Using Dasani's code, I was able to modify it and get it work. I had to convert PCM to WAV format before saving it out to a file. And I had a really weird hack where I needed to remove part of the buffer I get in the synthesizing callback. See the code to understand. I played around with various sizes and 46 bytes seems like the right amount.
I tried the following code to convert text to speech using result = speech_synthesizer.speak_text_async(text).get() with a .wav file and successfully converted the text to speech.
Code :
Output :
The code below successfully converted the text to speech output as follows.