skip to Main Content

Using Azure neural voices TTS via python speech services module, I am trying to get a custom lexicon to be used. Yes, I’ve spent hours reading and trying things already.

I’ve read that the lexicon file must be stored in Azure blob storage or Github. I’ve created blob storage, and ensured it is anonymously readable. I get audio output, but the phrase "BTW" in the SSML is pronounced as "By the way" which is the default alias built-in, and not the one I provided in my lexicon.

publicly readable lexicon file

<?xml version="1.0" encoding="utf-8"?>
<lexicon xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" version="1.0" alphabet="ipa" xml:lang="en-US" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon">
  <lexeme>
    <grapheme>BTW</grapheme>
    <alias>By the flippin' way</alias>
  </lexeme>
</lexicon>

SSML

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="en-US-EmmaNeural">
<lexicon uri="https://<mynamespace>.blob.core.windows.net/ttsfiles/lexicon3.xml"/>
The phrase is: BTW
</voice></speak>
  • namespace redacted
  • the postfix number I increment to get around the 15-minute caching rule

2

Answers


  1. Chosen as BEST ANSWER

    Reading the docs more closely, lexicons are not supported for the specific neural voices I was using. It's helpful to use the Speech Studio to debug.


  2. According to the documents, lexicon URLs support Azure Blob Storage.

    AFAIK, there no need to Store lexicon file in Azure blob storage for TTS .

    You can use SSML in Azure Cognitive Speech with Azure Storage.

    <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-US'>
      <mstts:backgroundaudio src='https://file-examples.com/storage/feb8f98f1d627c0dc94b8cf/2017/11/file_example_MP3_700KB.mp3' volume='0.7' fadein='3000' fadeout='4000'/>
      <voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>
        <phoneme alphabet='sapi' ph='jh iy 1 - n iy'>Jeanne</phoneme> says, "Welcome to our service!"
        <audio src='https://file-examples.com/storage/feb8f98f1d627c0dc94b8cf/2017/11/file_example_MP3_700KB.mp3'>This is fallback audio.</audio>
      </voice>
    </speak>
    
    
    

    Below is a Python code example for integrating the audio synthesis result with Azure Storage using Azure Text-to-Speech (TTS) with SSML:

    
    import azure.cognitiveservices.speech as speechsdk
    
    def azure_tts_with_ssml(ssml_text):
      
        subscription_key ="AzureSpeachKey" 
        service_region = "AzureSpeachRegion" 
    
        speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=service_region)
        
    
        speech_config.speech_synthesis_voice_name = "en-US-AvaNeural"
    
    
        synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    
      
        result = synthesizer.speak_ssml(ssml_text)
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech synthesized successfully.")
        elif result.reason == speechsdk.ResultReason.Error:
            print(f"Error synthesizing speech: {result.error_details}")
    
    if __name__ == "__main__":
       ssml = """
    <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-US'>
      <mstts:backgroundaudio src='https://AzureStorageName.blob.core.windows.net/ContainerName/OutputAudio1.wav' volume='0.7' fadein='3000' fadeout='4000'/>
      <voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>
        <phoneme alphabet='sapi' ph='jh iy 1 - n iy'>Jeanne</phoneme> says, "Welcome to our service!"
        <audio src='https://AzureStorageName.blob.core.windows.net/ContainerName/OutputAudio.wav'>This is fallback audio.</audio>
      </voice>
    </speak>
    
    
    """
        azure_tts_with_ssml(ssml)
    
    
    

    Output:
    enter image description here

    For an alternative approach, refer to this document to set up storage for the Speech resource.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search