i have afloat32 array how to create an Azure AudioInputStream

Sandeeppradeep
November 11, 2023
155 views
0 votes
2 Answers

i have taken realtime audio data and used as float32 array how to convert it to azure AudioStreamInput

import numpy as np
import azure.cognitiveservices.speech as speechsdk

class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, audio_array):
        self.audio_array = audio_array
        self.position = 0

    def read(self, buffer, offset, count):
        remaining = len(self.audio_array) - self.position
        to_read = min(remaining, count)
        buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
        self.position += to_read
        return to_read

Tried this but 'NumpyAudioStream' object has no attribute '_handle' error occuring

i have afloat32 array how to create an Azure AudioInputStream

Answers

Chosen as BEST ANSWER

i was trying to send the audio data in byte64 data through socket to flask socket app and from there converting the base 64 to float32 but the speechsdk is not transcribing the audio. p.s iam using speech sdk conversation_transcriber class for transcribing audio

.js code

  const startStreaming = async () => {
try {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  if (stream){
      setAudioStream(stream);
      
      if (!socket.current) {
          socket.current = io('http://127.0.0.1:5000');
        }
        
        const audioContext = new AudioContext();
        const mediaStreamSource = audioContext.createMediaStreamSource(stream);
        
        // Use AudioWorkletNode instead of deprecated ScriptProcessorNode
        await audioContext.audioWorklet.addModule('Worklet/module.js'); // Make sure to replace with the actual path
        const workletNode = new AudioWorkletNode(audioContext, 'audioWorkletProcessor');

        workletNode.port.onmessage = (event) => {
          const audioData = event.data;
          console.log(audioData.buffer)
          const base64String = btoa(String.fromCharCode(...new Uint8Array(audioData.buffer)));
          console.log(audioData)
          socket.current.emit('audio_data', base64String);
          const clientId = socket.current.id;

          // Send audio data through socket only if the data is not coming from this client
          if (audioData.senderId !== clientId) {
            socket.current.emit('audio_data', base64String);
          }
    };
    mediaStreamSource.connect(workletNode);
  }else {
    console.error('User denied access to the microphone.');
  }
} catch (error) {
  console.error(error);
}

};

(Edit)

- RishabhM
- November 10, 2023 at 12:02 pm
- 0 votes
0
Base on the provided information it seems, the data needs formatting to be used as input stream.

Below are changes/modification that can help to fix the issue.
1. In the __init__ method, the float32 audio array is converted to int16 because the Azure Speech SDK expects audio data in int16 format.
2. The read method, the buffer size is divided by 2 to account for the fact that int16 values take up 2 bytes. This ensures that the correct number of samples is read from the audio array.
3. The audio data is written to the buffer using the tobytes() method, which converts the int16 audio array slice into a byte array. This is necessary because the buffer expects byte data.
Below is modified code for reference:
```
class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, audio_array):
        super().__init__()
        self.audio_array = (audio_array * np.iinfo(np.int16).max).astype(np.int16)
        self.position = 0

    def read(self, buffer: memoryview) -> int:
        remaining = len(self.audio_array) - self.position
        to_read = min(remaining, buffer.nbytes // 2)
        buffer[:to_read * 2] = self.audio_array[self.position:self.position+to_read].tobytes()
        self.position += to_read
        return to_read * 2

    def close(self) -> None:
        pass
```
With the above changes, I was able to execute a float32 audio data file and get the results.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.