I’m trying to play in the browser audio that’s streamed from my backend to simulate a real-time voice call, in a way that’s compatible with all the relevant browsers, devices, and OSs.
The audio format is mp3 with 44.1kHz sample rate at 192kbps.
app.js
await audioContext.audioWorklet.addModule('call-processor.js');
const audioContext = new AudioContext({ sampleRate: 44100 });
const audioWorkletNode = new AudioWorkletNode(audioContext, 'call-processor');
// add streamed audio chunk to the audioworklet
const addAudioChunk = async (base64) => {
const buffer = Buffer.from(base64, 'base64');
try {
const audioBuffer = await audioContext.decodeAudioData(buffer.buffer);
const channelData = audioBuffer.getChannelData(0); // Assuming mono audio
audioWorkletNode.port.postMessage(channelData);
} catch (e) {
console.error(e);
}
};
call-processor.js
class CallProcessor extends AudioWorkletProcessor {
buffer = new Float32Array(0);
constructor() {
super();
this.port.onmessage = this.handleMessage.bind(this);
}
handleMessage(event) {
const chunk = event.data;
const newBuffer = new Float32Array(this.buffer.length + chunk.length);
newBuffer.set(this.buffer);
newBuffer.set(chunk, this.buffer.length);
this.buffer = newBuffer;
}
process(inputs, outputs) {
const output = outputs[0];
const channel = output[0];
const requiredSize = channel.length;
if (this.buffer.length < requiredSize) {
// Not enough data, zero-fill the output
channel.fill(0);
} else {
// Process the audio
channel.set(this.buffer.subarray(0, requiredSize));
// Remove processed data from the buffer
this.buffer = this.buffer.subarray(requiredSize);
}
return true;
}
}
registerProcessor('call-processor', CallProcessor);
I’m testing in the Chrome browser.
On PC, the first chunk in each stream response sounds perfect, while on iPhone it sounds weird and robotic.
In both cases, subsequent chunks sound a bit scrambled.
Could you please help me understand what I’m doing wrong?
I’ll note that I’m not set on AudioWorklet, the goal is to have a seamless audio stream that’s compatible with all the relevant browsers, devices, and OSs.
I’ve also tried 2 approaches with audio source, using standardized-audio-context (https://github.com/chrisguttandin/standardized-audio-context) for extensive compatibility:
a. Recursively waiting for the current audio buffer source to finish before playing the next one in the sequence.
b. Starting each audio buffer source as soon as its chunk is received, with the starting point being the sum of the durations of the preceding audio buffers.
Both approaches are not seamless and result in jumps between the chunks, which led me to change the strategy to AudioWorklet.
app.js
import { Buffer } from 'buffer';
import { AudioContext } from 'standardized-audio-context';
const audioContext = new AudioContext();
const addAudioChunk = async (base64) => {
const uint8array = Buffer.from(base64, 'base64');
const audioBuffer = await audioContext.decodeAudioData(uint8array.buffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
source.start(nextPlayTime);
nextPlayTime += audioBuffer.duration;
}
2
Answers
You could at least manages the playback timing by scheduling each chunk based on the current audio context time and ensures seamless playback across chunks.
Both of your approaches (a and b) are not going to work.
Your missing link is the difference in clock speed between your server and your client. This is a fundamental problem for every live streaming scenario.
Let’s assume your clients clock runs a little faster:
The client plays the audio too fast, runs out of samples => you hear a gap.
Let’s assume your clients clock runs a little slower:
The client plays the audio too slow => eventually you have to drop an entire buffer and hear a gap.
To solve this problem you have to create a client side input buffer.
Initially your client has to wait until the buffer is 50% full and then starts the playback. If it looks like the buffer gets nearly empty ==> you have to slow down the playback rate.
If it looks like the buffer reaches 100% then you have to increase the playback speed. Typically the playback rates varies very little – since you just compensate for the clock differences.
You can implement this all yourself. Or you could use a frame work that does it for you. For example WebRTC: https://webrtc.org/getting-started/peer-connections