i have taken realtime audio data and used as float32 array how to convert it to azure AudioStreamInput
import numpy as np
import azure.cognitiveservices.speech as speechsdk
class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
def __init__(self, audio_array):
self.audio_array = audio_array
self.position = 0
def read(self, buffer, offset, count):
remaining = len(self.audio_array) - self.position
to_read = min(remaining, count)
buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
self.position += to_read
return to_read
Tried this but 'NumpyAudioStream' object has no attribute '_handle'
error occuring
i have afloat32 array how to create an Azure AudioInputStream
2
Answers
i was trying to send the audio data in byte64 data through socket to flask socket app and from there converting the base 64 to float32 but the speechsdk is not transcribing the audio. p.s iam using speech sdk conversation_transcriber class for transcribing audio
.js code
};
Base on the provided information it seems, the data needs formatting to be used as input stream.
Below are changes/modification that can help to fix the issue.
__init__
method, the float32 audio array is converted to int16 because the Azure Speech SDK expects audio data in int16 format.read
method, the buffer size is divided by 2 to account for the fact that int16 values take up 2 bytes. This ensures that the correct number of samples is read from the audio array.tobytes()
method, which converts the int16 audio array slice into a byte array. This is necessary because the buffer expects byte data.Below is modified code for reference:
With the above changes, I was able to execute a float32 audio data file and get the results.