skip to Main Content

My use case is to convert text to speech using Azure and then play it into a virtual microphone.

option 1 – with an intermediate .wav file

I tried both steps manually on a Jupiter notebook.
The problem is, the output .wav file of Azure cannot be played directly on the python
"error: No file ‘file.wav’ found in working directory". When I restart the python kernal, audio can be played.

text-to-speech

audio_config = speechsdk.audio.AudioOutputConfig(filename="file.wav")
...
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

audio play

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

option 2 – direct stream to audio device

I tried to configure the audio output device of azure SDK.
this method worked for output devices. but when I add an ID of the virtual microphone, it won’t play any sound.

audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=False,device_name="{0.0.0.00000000}.{9D30BDBF-1418-4AFC-A709-CD4C431833E2}")

Also it will be much better if there is any other method that can direct the audio to a virtual microphone instead of the speaker.

2

Answers


  1. Chosen as BEST ANSWER

    I found a solution by changing the output a stream, saving to a file and then play it through pygame as follows,

    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
    stream = speechsdk.AudioDataStream(speech_synthesis_result)
    stream.save_to_wav_file("file.wav")
    
    mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
    mixer.music.load("file.wav")
    mixer.music.play()
    

    Also much appreciated if there is any other method that doesn't need any intermediate audio file.


  2. Create a speech service and get the key and location of the service.

    enter image description here

    Then set the environment with that key. Open command prompt and use the below code block.

    setx SPEECH_KEY yourkey
    

    Use import azure.cognitiveservices.speech as speechsdk

    After conversion, use the below code block to get the virtual device.

    audio_config = AudioConfig(device_name="<device id>");
    

    Get the device speaker information and set it in this location.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search