Azure neural TTS; lexicon (blob storage) ignored

GGibson
October 10, 2024
124 views
1 vote
2 Answers

Using Azure neural voices TTS via python speech services module, I am trying to get a custom lexicon to be used. Yes, I’ve spent hours reading and trying things already.

I’ve read that the lexicon file must be stored in Azure blob storage or Github. I’ve created blob storage, and ensured it is anonymously readable. I get audio output, but the phrase "BTW" in the SSML is pronounced as "By the way" which is the default alias built-in, and not the one I provided in my lexicon.

publicly readable lexicon file

<?xml version="1.0" encoding="utf-8"?>
<lexicon xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" version="1.0" alphabet="ipa" xml:lang="en-US" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon">
  <lexeme>
    <grapheme>BTW</grapheme>
    <alias>By the flippin' way</alias>
  </lexeme>
</lexicon>

SSML

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="en-US-EmmaNeural">
<lexicon uri="https://<mynamespace>.blob.core.windows.net/ttsfiles/lexicon3.xml"/>
The phrase is: BTW
</voice></speak>

namespace redacted
the postfix number I increment to get around the 15-minute caching rule

Answers

Chosen as BEST ANSWER
- GGibson
- October 10, 2024 at 6:07 am
- 0 votes
0
Reading the docs more closely, lexicons are not supported for the specific neural voices I was using. It's helpful to use the Speech Studio to debug.

(Edit)

According to the documents, lexicon URLs support Azure Blob Storage.

AFAIK, there no need to Store lexicon file in Azure blob storage for TTS .

You can use SSML in Azure Cognitive Speech with Azure Storage.

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-US'>
  <mstts:backgroundaudio src='https://file-examples.com/storage/feb8f98f1d627c0dc94b8cf/2017/11/file_example_MP3_700KB.mp3' volume='0.7' fadein='3000' fadeout='4000'/>
  <voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>
    <phoneme alphabet='sapi' ph='jh iy 1 - n iy'>Jeanne</phoneme> says, "Welcome to our service!"
    <audio src='https://file-examples.com/storage/feb8f98f1d627c0dc94b8cf/2017/11/file_example_MP3_700KB.mp3'>This is fallback audio.</audio>
  </voice>
</speak>

Below is a Python code example for integrating the audio synthesis result with Azure Storage using Azure Text-to-Speech (TTS) with SSML:


import azure.cognitiveservices.speech as speechsdk

def azure_tts_with_ssml(ssml_text):
  
    subscription_key ="AzureSpeachKey" 
    service_region = "AzureSpeachRegion" 

    speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=service_region)
    

    speech_config.speech_synthesis_voice_name = "en-US-AvaNeural"


    synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

  
    result = synthesizer.speak_ssml(ssml_text)
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized successfully.")
    elif result.reason == speechsdk.ResultReason.Error:
        print(f"Error synthesizing speech: {result.error_details}")

if __name__ == "__main__":
   ssml = """
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-US'>
  <mstts:backgroundaudio src='https://AzureStorageName.blob.core.windows.net/ContainerName/OutputAudio1.wav' volume='0.7' fadein='3000' fadeout='4000'/>
  <voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>
    <phoneme alphabet='sapi' ph='jh iy 1 - n iy'>Jeanne</phoneme> says, "Welcome to our service!"
    <audio src='https://AzureStorageName.blob.core.windows.net/ContainerName/OutputAudio.wav'>This is fallback audio.</audio>
  </voice>
</speak>


"""
    azure_tts_with_ssml(ssml)

Output:

For an alternative approach, refer to this document to set up storage for the Speech resource.

Please signup or login to give your own answer.

Click here to cancel reply.