skip to Main Content

I’m trying to play in the browser audio that’s streamed from my backend to simulate a real-time voice call, in a way that’s compatible with all the relevant browsers, devices, and OSs.

The audio format is mp3 with 44.1kHz sample rate at 192kbps.

app.js

await audioContext.audioWorklet.addModule('call-processor.js');

const audioContext = new AudioContext({ sampleRate: 44100 });
const audioWorkletNode = new AudioWorkletNode(audioContext, 'call-processor');

// add streamed audio chunk to the audioworklet
const addAudioChunk = async (base64) => {
  const buffer = Buffer.from(base64, 'base64');

  try {
    const audioBuffer = await audioContext.decodeAudioData(buffer.buffer);
    const channelData = audioBuffer.getChannelData(0); // Assuming mono audio

    audioWorkletNode.port.postMessage(channelData);
  } catch (e) {
    console.error(e);
  }
};

call-processor.js

class CallProcessor extends AudioWorkletProcessor {
  buffer = new Float32Array(0);

  constructor() {
    super();
    
    this.port.onmessage = this.handleMessage.bind(this);
  }

  handleMessage(event) {
    const chunk = event.data;

    const newBuffer = new Float32Array(this.buffer.length + chunk.length);

    newBuffer.set(this.buffer);
    newBuffer.set(chunk, this.buffer.length);

    this.buffer = newBuffer;
  }

  process(inputs, outputs) {
    const output = outputs[0];
    const channel = output[0];
    const requiredSize = channel.length;

    if (this.buffer.length < requiredSize) {
      // Not enough data, zero-fill the output
      channel.fill(0);
    } else {
      // Process the audio
      channel.set(this.buffer.subarray(0, requiredSize));
      // Remove processed data from the buffer
      this.buffer = this.buffer.subarray(requiredSize);
    }

    return true;
  }
}

registerProcessor('call-processor', CallProcessor);

I’m testing in the Chrome browser.

On PC, the first chunk in each stream response sounds perfect, while on iPhone it sounds weird and robotic.

In both cases, subsequent chunks sound a bit scrambled.

Could you please help me understand what I’m doing wrong?

I’ll note that I’m not set on AudioWorklet, the goal is to have a seamless audio stream that’s compatible with all the relevant browsers, devices, and OSs.

I’ve also tried 2 approaches with audio source, using standardized-audio-context (https://github.com/chrisguttandin/standardized-audio-context) for extensive compatibility:

a. Recursively waiting for the current audio buffer source to finish before playing the next one in the sequence.

b. Starting each audio buffer source as soon as its chunk is received, with the starting point being the sum of the durations of the preceding audio buffers.

Both approaches are not seamless and result in jumps between the chunks, which led me to change the strategy to AudioWorklet.

app.js

import { Buffer } from 'buffer';
import { AudioContext } from 'standardized-audio-context';

const audioContext = new AudioContext();

const addAudioChunk = async (base64) => {
  const uint8array = Buffer.from(base64, 'base64');
  const audioBuffer = await audioContext.decodeAudioData(uint8array.buffer);
  const source = audioContext.createBufferSource();

  source.buffer = audioBuffer;

  source.start(nextPlayTime);

  nextPlayTime += audioBuffer.duration;
}

2

Answers


  1. You could at least manages the playback timing by scheduling each chunk based on the current audio context time and ensures seamless playback across chunks.

    import { AudioContext } from 'standardized-audio-context';
    
    const audioContext = new AudioContext();
    let sourceNode = null;
    let nextStartTime = 0;
    
    const playAudioChunk = async (base64) => {
      const uint8array = Buffer.from(base64, 'base64');
      const audioBuffer = await audioContext.decodeAudioData(uint8array.buffer);
      
      if (!sourceNode) {
        sourceNode = audioContext.createBufferSource();
        sourceNode.buffer = audioBuffer;
        sourceNode.connect(audioContext.destination);
        sourceNode.start();
      } else {
        const currentTime = audioContext.currentTime;
        const duration = sourceNode.buffer.duration;
        const offset = nextStartTime - currentTime;
        
        if (offset > 0) {
          setTimeout(() => {
            const newSource = audioContext.createBufferSource();
            newSource.buffer = audioBuffer;
            newSource.connect(audioContext.destination);
            newSource.start(nextStartTime);
            sourceNode = newSource;
          }, offset * 1000); // Convert seconds to milliseconds
        }
      }
      
      nextStartTime += audioBuffer.duration;
    };
    
    Login or Signup to reply.
  2. Both of your approaches (a and b) are not going to work.

    Your missing link is the difference in clock speed between your server and your client. This is a fundamental problem for every live streaming scenario.

    Let’s assume your clients clock runs a little faster:
    The client plays the audio too fast, runs out of samples => you hear a gap.

    Let’s assume your clients clock runs a little slower:
    The client plays the audio too slow => eventually you have to drop an entire buffer and hear a gap.

    To solve this problem you have to create a client side input buffer.
    Initially your client has to wait until the buffer is 50% full and then starts the playback. If it looks like the buffer gets nearly empty ==> you have to slow down the playback rate.
    If it looks like the buffer reaches 100% then you have to increase the playback speed. Typically the playback rates varies very little – since you just compensate for the clock differences.

    You can implement this all yourself. Or you could use a frame work that does it for you. For example WebRTC: https://webrtc.org/getting-started/peer-connections

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search