I am experiencing an issue with running Microsoft’s text-to-speech on Google Cloud Run. The problem arose suddenly last night and I’ve been getting the following error:
Traceback (most recent call last):
File "/code/app/./speech/backend.py", line 42, in save_text_to_speech
speech_api.speech()
File "/code/app/./speech/speech_api.py", line 266, in speech
synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)
File "/usr/local/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech.py", line 1598, in __init__
self._impl = self._get_impl(impl.SpeechSynthesizer, speech_config, audio_config,
File "/usr/local/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech.py", line 1703, in _get_impl
_impl = synth_type._from_config(speech_config._impl, None if audio_config is None else audio_config._impl)
RuntimeError: Runtime error: Failed to initialize platform (azure-c-shared). Error: 2153
The error occurs when I try to execute synthesizer.speak_ssml(). Here is the related code:
audio_config = AudioOutputConfig(filename=file_name)
synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)
synthesizer.speak_ssml(self.input_data['text'])
Interestingly, this issue doesn’t occur in my local environment. Additionally, if I build the image locally and deploy it to Cloud Run, I don’t encounter this error.
My local environment is:
- MacOS 11.6.2
- Docker v20.10.10
However, when I build it with CloudBuild and deploy it to Cloud Run, I get the above error. I have tried the following to resolve it:
- Clearing the kaniko cache
- Switching from ‘kaniko’ to ‘gcr.io/cloud-builders/docker’
Neither of these attempts resolved the issue. Considering the circumstances under which the error occurs, I suspect there might be a problem with CloudBuild, but I can’t pinpoint the exact cause. If there are any other potential solutions I could try, I would greatly appreciate your advice.
Update 2023-07-18
FROM python:3.11
WORKDIR /app
RUN apt-get update &&
apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget &&
wget -O - https://www.openssl.org/source/openssl-1.1.1u.tar.gz | tar zxf - &&
cd openssl-1.1.1u &&
./config --prefix=/usr/local &&
make -j $(nproc) &&
make install_sw install_ssldirs &&
ldconfig -v &&
export SSL_CERT_DIR=/etc/ssl/certs &&
cd ../ &&
rm -rf openssl-1.1.1u &&
pip install --no-cache-dir azure-cognitiveservices-speech==1.30.0
COPY . /app
CMD ["python3", "app.py"]
import os
import azure.cognitiveservices.speech as speechsdk
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
print('key:'+os.environ.get('SPEECH_KEY'))
print('region:'+os.environ.get('SPEECH_REGION'))
# The language of the voice that speaks.
speech_config.speech_synthesis_voice_name='en-US-JennyNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
# Get text from the console and synthesize to the default speaker.
text = "Hello world!"
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_synthesis_result.cancellation_details
print("Speech synthesis canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
if cancellation_details.error_details:
print("Error details: {}".format(cancellation_details.error_details))
print("Did you set the speech resource key and region values?")
Code from MS document
Error
Speech synthesis canceled: CancellationReason.Error
Error details: Connection failed (no connection to the remote host). Internal error: 1. Error details: Failed with error: WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED
wss://southeastasia.tts.speech.microsoft.com/cognitiveservices/websocket/v1
X-ConnectionId: c4955d953f8e480c906061e6219eb8fd USP state: Sending. Received audio size: 0 bytes.
Did you set the speech resource key and region values?
I had set SPEECH_KEY and SPEECH_REGION. It was printed on console. However I got the error. Please help me.
2
Answers
I was able to resolve this issue by reaching out to Microsoft. I will share the solution here.
When I implemented log activation for TTS, the following error occurred:
This error indicates that the certificate verification process to establish a TLS session failed. The cause of the error could be potentially influenced by the operating system being used. Specifically, an error may occur if the location where the Speech SDK expects the certificate information to be stored does not match the location where the OS actually stores the certificate information.
To resolve this issue, adjustments need to be made for the handling of certificates when using "Python:3.9", which is used in the Dockerfile as an OS layer for Debian (bookworm).
Specifically, the problem can be resolved by setting the "SSL_CERT_DIR" environment variable as follows:
export SSL_CERT_DIR=/usr/lib/ssl/certs
This has confirmed that Text-to-Speech can be used within the container.
Run
python app.py
Here is the Dockerfile:
I also got the same error with Python 3.10 in Dockerfile as below,
Dockerfile:
Ouput:
Then, I changed the Python version to 3.9 and got the audio output with the input text .
Code:
app.py:
I tried below sample code to generate audio with input text.
Dockerfile:
Output:
Below is the command to build a Docker image:
Successfully build the Docker image as below,
Below is the command to run Docker image:
It runs successfully without any errors as below,
Then, We need to check the Docker container ID to get audio generated output.wav file.
Command to check Docker Container ID:
You can also get the Docker container ID in Docker Desktop directly as below,
And below is the command to get the audio genrated to the output.wav file.
It successfully copied to output.wav file as below,
Reference:
Check this link to know more about Convertion of text to speech.