skip to Main Content

I am experiencing an issue with running Microsoft’s text-to-speech on Google Cloud Run. The problem arose suddenly last night and I’ve been getting the following error:

 Traceback (most recent call last):
  File "/code/app/./speech/backend.py", line 42, in save_text_to_speech
    speech_api.speech()
  File "/code/app/./speech/speech_api.py", line 266, in speech
    synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)
  File "/usr/local/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech.py", line 1598, in __init__
    self._impl = self._get_impl(impl.SpeechSynthesizer, speech_config, audio_config,
  File "/usr/local/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech.py", line 1703, in _get_impl
    _impl = synth_type._from_config(speech_config._impl, None if audio_config is None else audio_config._impl)
RuntimeError: Runtime error: Failed to initialize platform (azure-c-shared). Error: 2153 

The error occurs when I try to execute synthesizer.speak_ssml(). Here is the related code:

audio_config = AudioOutputConfig(filename=file_name)
synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)
synthesizer.speak_ssml(self.input_data['text'])

Interestingly, this issue doesn’t occur in my local environment. Additionally, if I build the image locally and deploy it to Cloud Run, I don’t encounter this error.

My local environment is:

  • MacOS 11.6.2
  • Docker v20.10.10

However, when I build it with CloudBuild and deploy it to Cloud Run, I get the above error. I have tried the following to resolve it:

  • Clearing the kaniko cache
  • Switching from ‘kaniko’ to ‘gcr.io/cloud-builders/docker’

Neither of these attempts resolved the issue. Considering the circumstances under which the error occurs, I suspect there might be a problem with CloudBuild, but I can’t pinpoint the exact cause. If there are any other potential solutions I could try, I would greatly appreciate your advice.

Update 2023-07-18

FROM python:3.11

WORKDIR /app


RUN apt-get update && 
    apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget && 
    wget -O - https://www.openssl.org/source/openssl-1.1.1u.tar.gz | tar zxf - && 
    cd openssl-1.1.1u && 
    ./config --prefix=/usr/local && 
    make -j $(nproc) && 
    make install_sw install_ssldirs && 
    ldconfig -v && 
    export SSL_CERT_DIR=/etc/ssl/certs && 
    cd ../ && 
    rm -rf openssl-1.1.1u && 
    pip install --no-cache-dir azure-cognitiveservices-speech==1.30.0
COPY . /app
CMD ["python3", "app.py"]


import os
import azure.cognitiveservices.speech as speechsdk

# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
print('key:'+os.environ.get('SPEECH_KEY'))
print('region:'+os.environ.get('SPEECH_REGION'))
# The language of the voice that speaks.
speech_config.speech_synthesis_voice_name='en-US-JennyNeural'

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

# Get text from the console and synthesize to the default speaker.
text = "Hello world!"

speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = speech_synthesis_result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

Code from MS document

Error

Speech synthesis canceled: CancellationReason.Error
Error details: Connection failed (no connection to the remote host). Internal error: 1. Error details: Failed with error: WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED
wss://southeastasia.tts.speech.microsoft.com/cognitiveservices/websocket/v1
X-ConnectionId: c4955d953f8e480c906061e6219eb8fd USP state: Sending. Received audio size: 0 bytes.
Did you set the speech resource key and region values?

I had set SPEECH_KEY and SPEECH_REGION. It was printed on console. However I got the error. Please help me.

2

Answers


  1. Chosen as BEST ANSWER

    I was able to resolve this issue by reaching out to Microsoft. I will share the solution here.

    When I implemented log activation for TTS, the following error occurred:

    [342629]: 104ms SPX_TRACE_ERROR: AZ_LOG_ERROR: tlsio_openssl.c:691 error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed
    

    This error indicates that the certificate verification process to establish a TLS session failed. The cause of the error could be potentially influenced by the operating system being used. Specifically, an error may occur if the location where the Speech SDK expects the certificate information to be stored does not match the location where the OS actually stores the certificate information.

    To resolve this issue, adjustments need to be made for the handling of certificates when using "Python:3.9", which is used in the Dockerfile as an OS layer for Debian (bookworm).

    Specifically, the problem can be resolved by setting the "SSL_CERT_DIR" environment variable as follows:

    export SSL_CERT_DIR=/usr/lib/ssl/certs

    This has confirmed that Text-to-Speech can be used within the container.

    Run python app.py

    Here is the Dockerfile:

    FROM python:3.9
    
    WORKDIR /app
    
    
    RUN apt-get update && 
        apt-get install -y build-essential libssl-dev ca-certificates libasound2 wget && 
        wget -O - https://www.openssl.org/source/openssl-1.1.1u.tar.gz | tar zxf - && 
        cd openssl-1.1.1u && 
        ./config --prefix=/usr/local && 
        make -j $(nproc) && 
        make install_sw install_ssldirs && 
        ldconfig -v && 
        export SSL_CERT_DIR=/etc/ssl/certs && 
        cd ../ && 
        rm -rf openssl-1.1.1u && 
        pip install --no-cache-dir azure-cognitiveservices-speech==1.30.0
    ENV SSL_CERT_DIR=/usr/lib/ssl/certs
    COPY . /app
    CMD ["python3", "app.py"]
    

  2. I also got the same error with Python 3.10 in Dockerfile as below,

    Dockerfile:

    FROM python:3.10
    
    WORKDIR /app
    
    COPY . /app
    
    RUN pip install --no-cache-dir azure-cognitiveservices-speech==1.29.0
    
    CMD ["python3", "app.py"]
    

    Ouput:

    enter image description here

    Then, I changed the Python version to 3.9 and got the audio output with the input text .

    Code:

    app.py:

    I tried below sample code to generate audio with input text.

    import azure.cognitiveservices.speech as speechsdk
    
    AZURE_SUBSCRIPTION_KEY = '<key>'
    AZURE_REGION = '<region>'
    voice_name = 'en-IN-NeerjaNeural'
    
    def save_text_to_speech(text, file_name):
        speech_config = speechsdk.SpeechConfig(subscription=AZURE_SUBSCRIPTION_KEY, region=AZURE_REGION)
        audio_config = speechsdk.audio.AudioOutputConfig(filename=file_name)
        synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
        result = synthesizer.speak_text_async(text).get()
    
    text_to_speak = "Hi Kamali, how are you?"
    output_file = "output.wav"
    save_text_to_speech(text_to_speak, output_file)
    

    Dockerfile:

    FROM python:3.9
    
    WORKDIR /app
    
    COPY . /app
    
    RUN pip install --no-cache-dir azure-cognitiveservices-speech==1.18.0
    
    CMD ["python3", "app.py"]
    

    Output:

    Below is the command to build a Docker image:

    docker build -t <image_name> <build_context>
    

    Successfully build the Docker image as below,

    enter image description here

    Below is the command to run Docker image:

    docker run <image_name> 
    

    It runs successfully without any errors as below,

    enter image description here

    Then, We need to check the Docker container ID to get audio generated output.wav file.

    Command to check Docker Container ID:

    docker ps -a
    

    enter image description here

    You can also get the Docker container ID in Docker Desktop directly as below,

    enter image description here

    And below is the command to get the audio genrated to the output.wav file.

    docker cp <container_id>:/app/output.wav <path to output.wav file>
    

    It successfully copied to output.wav file as below,
    enter image description here

    Reference:

    Check this link to know more about Convertion of text to speech.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search