skip to Main Content

I am attempting to convert a .wav file, containing audio of someone talking, to a transcription of what was said. It is a mobile app so I am using React Native and expo go for development. The audio is sent to an azure HTTP trigger function where the audio (encoded as Base64) is decoded attempted to be used for azure’s speech recognition. I have made sure that the sample rate, channel and sample width are all correct for the sdk.

def speech_recognize_continuous_from_file(audio_data):
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    # ERROR OCCURS HERE: stream=audio_data does not work
    audio_config = speechsdk.audio.AudioConfig(stream=audio_data)

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)


def transcriptionFunction(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    try:
        req_body = req.get_json()
        audioBase64 = req_body.get('audioBase64')

        # Converts base64 to wav
        decodedAudio = base64.b64decode(audioBase64)
        audioIO = io.BytesIO(decodedAudio)

        # Begins transcription
        speech_recognize_continuous_from_file(audioIO)
        

        return func.HttpResponse("Check Server Console for response", status_code=200)

I have tested my speech recognizing continuous function with a .wav file so I know that works. I have also checked the right format of the .wav file which is correct. Due to this being a serverless function, I cannot use filename= as there is no local storage.

2

Answers


  1. Chosen as BEST ANSWER

    Your answer helped alot and sent me down the right path. You are correct that I need to use filename. The next error I encountered was you are unable to store files in azure functions as they are stateless. To then counter this issue all was needed was the use of tempfile library. The completed solution is below.

        with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_audio_file:
            tmp_audio_file.write(decodedAudio)
            tmp_filename = tmp_audio_file.name
    
        # Stores the resulted text
        text_result = speech_recognize_continuous_from_stream(tmp_filename)
    
        os.unlink(tmp_filename)
    

  2. The error ‘_io.BytesIO’ object has no attribute ‘_handle’ suggesting that the stream attribute is not recognized as expected by the Speech SDK.

    The issue arises from passing a BytesIO object directly to speechsdk.audio.AudioConfig(stream=audio_stream). This constructor expects a file-like object, but a BytesIO object doesn’t have a _handle attribute, causing the error.

    To fix this, you can use a .wav file in the line below: speechsdk.audio.AudioConfig(filename="temp.wav") instead of passing the raw audio data directly to the speechsdk.audio.AudioConfig() constructor. Here’s the modified code:

    Code :

    import logging
    import azure.functions as func
    import base64
    import os
    import azure.cognitiveservices.speech as speechsdk
    
    speech_key = "<speech_key>"
    service_region = "<speech_region>"
    
    def speech_recognize_continuous_from_stream(audio_data):
        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
        audio_config = speechsdk.audio.AudioConfig(filename="temp.wav")
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
        result = speech_recognizer.recognize_once()
        return result.text if result.reason == speechsdk.ResultReason.RecognizedSpeech else ""
    
    def main(req: func.HttpRequest) -> func.HttpResponse:
        logging.info('Python HTTP trigger function processed a request.')
    
        try:
            req_body = req.get_json()
            audioBase64 = req_body.get('audioBase64')
            decodedAudio = base64.b64decode(audioBase64)
            
            with open("temp.wav", "wb") as audio_file:
                audio_file.write(decodedAudio)
            transcription_result = speech_recognize_continuous_from_stream("temp.wav")
            os.unlink("temp.wav")
    
            return func.HttpResponse(transcription_result, status_code=200)
    
        except Exception as e:
            logging.error(f"Error: {str(e)}")
            return func.HttpResponse("Internal Server Error", status_code=500)
    

    Postman output :

    {
        "audioBase64":"your_base64_data"
    }
    
    Hello, this is a test of the speech synthesis service.
    

    enter image description here

    Output :

    It ran successfully as shown below.

    C:UsersxxxxxxxDocumentsxxxxxxx>func start
    Found Python version 3.10.11 (python).
    
    Azure Functions Core Tools
    Core Tools Version:       4.0.5030 Commit hash: N/A  (64-bit)
    Function Runtime Version: 4.15.2.20177
    
    
    Functions:
    
            HttpTrigger1: [GET,POST] http://localhost:7071/api/HttpTrigger1
    
    For detailed output, run func with --verbose flag.
    [2024-02-10T19:58:48.856Z] Worker process started and initialized.
    [2024-02-10T19:58:54.658Z] Host lock lease acquired by instance ID '00000xxxxxxxxxxxxxxxxxx'.
    [2024-02-10T19:58:56.634Z] Executing 'Functions.HttpTrigger1' (Reason='This function was programmatically called via the host APIs.', Id=3cd9c444b944xxxxxxxxxxxx)
    [2024-02-10T19:58:56.843Z] Python HTTP trigger function processed a request.
    [2024-02-10T19:59:00.598Z] Executed 'Functions.HttpTrigger1' (Succeeded, Id=3cd9c444xxxxxxxxxx, Duration=4040ms)
    

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search