We are using Azure Conversation Transcriber for realtime speech to text with diarization. We need to incorporate the pause_resume feature. We tried different ways but nothing worked.
Azure only provides stop_transcribing_async() function that completely stops the current session.
I have attached the code we tried but it is not working, Any help will be appreciated. I have attached a block of code that contains the logic for pausing and resuming. Please do advice what else method we could follow.
In the below code, we stop the transcriber completely once "pause" message is passed and restart the same once the "resume" message is detected.
async def receive_audio(uuid, path):
audio_queue = Queue(maxsize=0)
transcriber_state = False
try:
conversation_transcriber, push_stream = create_conversation_transcriber(
CONNECTIONS.connections[uuid]
)
# Start continuous recognition
conversation_transcriber.start_transcribing_async().get()
transcriber_state = True
while True:
# Receive audio data from the WebSocket
websocket = CONNECTIONS.connections[uuid]["websocket"]
data = await websocket.recv()
logger.info(CONNECTIONS.connections[uuid]['state'])
if isinstance(data, str):
logger.info(f"Current State: {CONNECTIONS.connections[uuid]['state']}")
if data == "inactive":
logger.info("Pausing the transcriber...")
conversation_transcriber.stop_transcribing_async().get()
push_stream.close()
transcriber_state = False
elif data == "active" and not transcriber_state:
logger.info(f"Resuming the transcriber...")
conversation_transcriber, push_stream = create_conversation_transcriber()
conversation_transcriber.start_transcribing_async().get()
transcriber_state = True
CONNECTIONS.connections[uuid]["state"] = data
if CONNECTIONS.connections[uuid]["state"] == "active":
audio_queue.put_nowait(data)
while not audio_queue.empty():
chunk = get_chunk_from_queue(q=audio_queue, chunk_size=4096)
CONNECTIONS.connections[uuid]["audio_buffer"] += chunk
push_stream.write(chunk)
except websockets.exceptions.ConnectionClosed as e:
logger.info("Connection closed")
logger.info(e)
conversation_transcriber.stop_transcribing_async().get()
push_stream.close()
except Exception as e:
logger.error(f"Error in receive_audio: {e}")
finally:
await websocket.close(code=1000)
2
Answers
Incorporating a pause and resume feature for the Azure Conversation Transcriber requires handling the stop_transcribing_async and start_transcribing_async methods appropriately. Your current approach stops and restarts the transcriber but does it in a way that might cause issues with the state management and the audio queue.
Here you can control the flow of audio data by pausing the input stream (i.e., stop feeding audio to the push stream). This simulates a pause in transcription without completely stopping the transcriber session.
App.py:
when you receive a "pause" command, you can buffer the incoming audio data and delay pushing it to the transcriber until a "resume" command is received.
CONNECTIONS[uuid]["state"]
controls the flow of audio to the transcriber. When the state is "inactive," the audio stream is not fed to the transcriber.Console Log:
![enter image description here](Here you can control the flow of audio data by pausing the input stream (i.e., stop feeding audio to the push stream). This simulates a pause in transcription without completely stopping the transcriber session.
App.py:
when you receive a "pause" command, you can buffer the incoming audio data and delay pushing it to the transcriber until a "resume" command is received.
CONNECTIONS[uuid]["state"]
controls the flow of audio to the transcriber. When the state is "inactive," the audio stream is not fed to the transcriber.Console Log:
)