I’m using F1 (non-free tier, OpenAI Neural non-HD voices) of text-to-speech on Azure through the python API. I’m having deterministic partial completions with an ‘Internal Server Error’ and ‘partial data received’ message ending the audio rendering mid-word. And yet, the same SSML works flawlessly using my same TTS instance through Speech Studio.
input SSML xml file: demo.xml
standalone python API code: demo.py
log file output: log.txt (you can see synthesis timing out)
- SSML works in speech studio
- SSML fails to fully render using the python code
- but, the SSML partially renders, so the speech sdk config is correct
Log File Excerpt
[405035]: 35806ms SPX_TRACE_VERBOSE: synthesizer_timeout_management.cpp:85 IsTimeout: synthesis might timeout, current RTF: 0.77 (threshold: 2.00), frame interval 9967 ms (threshold 3000ms)
[405035]: 35856ms SPX_TRACE_WARNING: synthesizer_timeout_management.cpp:80 IsTimeout: synthesis timed out, current RTF: 0.78 (threshold: 2.00), frame interval 10017 ms (threshold 3000ms)
[405035]: 35857ms SPX_DBG_TRACE_VERBOSE: usp_tts_engine_adapter.cpp:376 StopSpeaking
[405035]: 35857ms SPX_DBG_TRACE_VERBOSE: usp_tts_engine_adapter.cpp:1040 Response: On Error: Code:6, Message: Timeout while synthesizing. Current RTF: 0.775118 (threshold 2), frame interval 10018ms (threshold 3000ms)..
2
Answers
Cool! What we learned is that the python API breaks if SSML elements are indented by spaces. I'd call that a bug, but I haven't read the SSML spec to know better.
Thanks to Suresh for suggesting the SSML may be to blame, even if some speech services accept it okay.
The Python API fails to handle SSML that is indented by spaces. When the SSML elements were indented, the API would break, leading to incomplete synthesis.
Here’s the version of the SSML that works without indentation:
Code sample:
Result: