skip to Main Content

i have taken realtime audio data and used as float32 array how to convert it to azure AudioStreamInput

import numpy as np
import azure.cognitiveservices.speech as speechsdk

class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, audio_array):
        self.audio_array = audio_array
        self.position = 0

    def read(self, buffer, offset, count):
        remaining = len(self.audio_array) - self.position
        to_read = min(remaining, count)
        buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
        self.position += to_read
        return to_read

Tried this but 'NumpyAudioStream' object has no attribute '_handle' error occuring

i have afloat32 array how to create an Azure AudioInputStream

2

Answers


  1. Chosen as BEST ANSWER

    i was trying to send the audio data in byte64 data through socket to flask socket app and from there converting the base 64 to float32 but the speechsdk is not transcribing the audio. p.s iam using speech sdk conversation_transcriber class for transcribing audio

    .js code

      const startStreaming = async () => {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      if (stream){
          setAudioStream(stream);
          
          if (!socket.current) {
              socket.current = io('http://127.0.0.1:5000');
            }
            
            const audioContext = new AudioContext();
            const mediaStreamSource = audioContext.createMediaStreamSource(stream);
            
            // Use AudioWorkletNode instead of deprecated ScriptProcessorNode
            await audioContext.audioWorklet.addModule('Worklet/module.js'); // Make sure to replace with the actual path
            const workletNode = new AudioWorkletNode(audioContext, 'audioWorkletProcessor');
    
            workletNode.port.onmessage = (event) => {
              const audioData = event.data;
              console.log(audioData.buffer)
              const base64String = btoa(String.fromCharCode(...new Uint8Array(audioData.buffer)));
              console.log(audioData)
              socket.current.emit('audio_data', base64String);
              const clientId = socket.current.id;
    
              // Send audio data through socket only if the data is not coming from this client
              if (audioData.senderId !== clientId) {
                socket.current.emit('audio_data', base64String);
              }
        };
        mediaStreamSource.connect(workletNode);
      }else {
        console.error('User denied access to the microphone.');
      }
    } catch (error) {
      console.error(error);
    }
    

    };


  2. Base on the provided information it seems, the data needs formatting to be used as input stream.

    Below are changes/modification that can help to fix the issue.

    1. In the __init__ method, the float32 audio array is converted to int16 because the Azure Speech SDK expects audio data in int16 format.
    2. The read method, the buffer size is divided by 2 to account for the fact that int16 values take up 2 bytes. This ensures that the correct number of samples is read from the audio array.
    3. The audio data is written to the buffer using the tobytes() method, which converts the int16 audio array slice into a byte array. This is necessary because the buffer expects byte data.

    Below is modified code for reference:

    class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
        def __init__(self, audio_array):
            super().__init__()
            self.audio_array = (audio_array * np.iinfo(np.int16).max).astype(np.int16)
            self.position = 0
    
        def read(self, buffer: memoryview) -> int:
            remaining = len(self.audio_array) - self.position
            to_read = min(remaining, buffer.nbytes // 2)
            buffer[:to_read * 2] = self.audio_array[self.position:self.position+to_read].tobytes()
            self.position += to_read
            return to_read * 2
    
        def close(self) -> None:
            pass
    

    With the above changes, I was able to execute a float32 audio data file and get the results.
    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search