React native - Azure Speech-Text '_io.BytesIO' object has no attribute '_handle'

MattDrinkall
February 20, 2024
109 views
0 votes
2 Answers

I am attempting to convert a .wav file, containing audio of someone talking, to a transcription of what was said. It is a mobile app so I am using React Native and expo go for development. The audio is sent to an azure HTTP trigger function where the audio (encoded as Base64) is decoded attempted to be used for azure’s speech recognition. I have made sure that the sample rate, channel and sample width are all correct for the sdk.

def speech_recognize_continuous_from_file(audio_data):
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    # ERROR OCCURS HERE: stream=audio_data does not work
    audio_config = speechsdk.audio.AudioConfig(stream=audio_data)

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)


def transcriptionFunction(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    try:
        req_body = req.get_json()
        audioBase64 = req_body.get('audioBase64')

        # Converts base64 to wav
        decodedAudio = base64.b64decode(audioBase64)
        audioIO = io.BytesIO(decodedAudio)

        # Begins transcription
        speech_recognize_continuous_from_file(audioIO)
        

        return func.HttpResponse("Check Server Console for response", status_code=200)

I have tested my speech recognizing continuous function with a .wav file so I know that works. I have also checked the right format of the .wav file which is correct. Due to this being a serverless function, I cannot use filename= as there is no local storage.

Answers

Chosen as BEST ANSWER
- MattDrinkall
- February 20, 2024 at 4:14 pm
- 0 votes
0
Your answer helped alot and sent me down the right path. You are correct that I need to use filename. The next error I encountered was you are unable to store files in azure functions as they are stateless. To then counter this issue all was needed was the use of tempfile library. The completed solution is below.
```
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_audio_file:
        tmp_audio_file.write(decodedAudio)
        tmp_filename = tmp_audio_file.name

    # Stores the resulted text
    text_result = speech_recognize_continuous_from_stream(tmp_filename)

    os.unlink(tmp_filename)
```

(Edit)

The error ‘_io.BytesIO’ object has no attribute ‘_handle’ suggesting that the stream attribute is not recognized as expected by the Speech SDK.

The issue arises from passing a BytesIO object directly to speechsdk.audio.AudioConfig(stream=audio_stream). This constructor expects a file-like object, but a BytesIO object doesn’t have a _handle attribute, causing the error.

To fix this, you can use a .wav file in the line below: speechsdk.audio.AudioConfig(filename="temp.wav") instead of passing the raw audio data directly to the speechsdk.audio.AudioConfig() constructor. Here’s the modified code:

Code :

import logging
import azure.functions as func
import base64
import os
import azure.cognitiveservices.speech as speechsdk

speech_key = "<speech_key>"
service_region = "<speech_region>"

def speech_recognize_continuous_from_stream(audio_data):
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename="temp.wav")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    result = speech_recognizer.recognize_once()
    return result.text if result.reason == speechsdk.ResultReason.RecognizedSpeech else ""

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    try:
        req_body = req.get_json()
        audioBase64 = req_body.get('audioBase64')
        decodedAudio = base64.b64decode(audioBase64)
        
        with open("temp.wav", "wb") as audio_file:
            audio_file.write(decodedAudio)
        transcription_result = speech_recognize_continuous_from_stream("temp.wav")
        os.unlink("temp.wav")

        return func.HttpResponse(transcription_result, status_code=200)

    except Exception as e:
        logging.error(f"Error: {str(e)}")
        return func.HttpResponse("Internal Server Error", status_code=500)

Postman output :

{
    "audioBase64":"your_base64_data"
}

Hello, this is a test of the speech synthesis service.

Output :

It ran successfully as shown below.

C:UsersxxxxxxxDocumentsxxxxxxx>func start
Found Python version 3.10.11 (python).

Azure Functions Core Tools
Core Tools Version:       4.0.5030 Commit hash: N/A  (64-bit)
Function Runtime Version: 4.15.2.20177


Functions:

        HttpTrigger1: [GET,POST] http://localhost:7071/api/HttpTrigger1

For detailed output, run func with --verbose flag.
[2024-02-10T19:58:48.856Z] Worker process started and initialized.
[2024-02-10T19:58:54.658Z] Host lock lease acquired by instance ID '00000xxxxxxxxxxxxxxxxxx'.
[2024-02-10T19:58:56.634Z] Executing 'Functions.HttpTrigger1' (Reason='This function was programmatically called via the host APIs.', Id=3cd9c444b944xxxxxxxxxxxx)
[2024-02-10T19:58:56.843Z] Python HTTP trigger function processed a request.
[2024-02-10T19:59:00.598Z] Executed 'Functions.HttpTrigger1' (Succeeded, Id=3cd9c444xxxxxxxxxx, Duration=4040ms)

Please signup or login to give your own answer.

Click here to cancel reply.

React native – Azure Speech-Text '_io.BytesIO' object has no attribute '_handle'

Answers