I am attempting to convert a .wav file, containing audio of someone talking, to a transcription of what was said. It is a mobile app so I am using React Native and expo go for development. The audio is sent to an azure HTTP trigger function where the audio (encoded as Base64) is decoded attempted to be used for azure’s speech recognition. I have made sure that the sample rate, channel and sample width are all correct for the sdk.
def speech_recognize_continuous_from_file(audio_data):
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# ERROR OCCURS HERE: stream=audio_data does not work
audio_config = speechsdk.audio.AudioConfig(stream=audio_data)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
def transcriptionFunction(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
try:
req_body = req.get_json()
audioBase64 = req_body.get('audioBase64')
# Converts base64 to wav
decodedAudio = base64.b64decode(audioBase64)
audioIO = io.BytesIO(decodedAudio)
# Begins transcription
speech_recognize_continuous_from_file(audioIO)
return func.HttpResponse("Check Server Console for response", status_code=200)
I have tested my speech recognizing continuous function with a .wav file so I know that works. I have also checked the right format of the .wav file which is correct. Due to this being a serverless function, I cannot use filename= as there is no local storage.
2
Answers
Your answer helped alot and sent me down the right path. You are correct that I need to use filename. The next error I encountered was you are unable to store files in azure functions as they are stateless. To then counter this issue all was needed was the use of tempfile library. The completed solution is below.
The error ‘_io.BytesIO’ object has no attribute ‘_handle’ suggesting that the stream attribute is not recognized as expected by the Speech SDK.
The issue arises from passing a BytesIO object directly to speechsdk.audio.AudioConfig(stream=audio_stream). This constructor expects a file-like object, but a BytesIO object doesn’t have a _handle attribute, causing the error.
To fix this, you can use a .wav file in the line below: speechsdk.audio.AudioConfig(filename="temp.wav") instead of passing the raw audio data directly to the speechsdk.audio.AudioConfig() constructor. Here’s the modified code:
Code :
Postman output :
Output :
It ran successfully as shown below.