skip to Main Content

I’m using the "SpeechSynthesisUtterance" to generate sound from text, I wonder if it is possible to connect "AnalyserNode" to the output audio stream, so I can create voice viz to the output of the text to speech web API in javascript or typescript.

2

Answers


  1. Chosen as BEST ANSWER

    Thanks @chrisguttandin I've created something like this:

            utterance.onstart = (event) => {
    
                console.log(event.currentTarget);
    
                navigator.mediaDevices.enumerateDevices()
                    // set `getUserMedia()` constraints to "auidooutput", where avaialable
                    // see https://bugzilla.mozilla.org/show_bug.cgi?id=934425, https://stackoverflow.com/q/33761770
    
                    .then(devices => {
                        const audiooutput = devices.find(device => device.kind === "audiooutput" && device.deviceId === "default");
    
                        if (audiooutput) {
                            const constraints = {
                                'audio': true,
                                deviceId: {
                                    exact: audiooutput.deviceId
                                }
                            };
    
                            navigator.mediaDevices.getUserMedia({
                                audio: constraints
                            }).then((stream: MediaStream) => {
    
                                let equalizer = new Equalizer(stream);
    
                                console.log('stream.active: ', stream.getAudioTracks().length);
                            });
                        }
                    });
            };
    

    but the stream is empty, any advice, what is exactly the straem is returning here? My goal is to capture the sound is going out from the speakers. I don' want to use MediaDevices.getDisplayMedia()


  2. Unfortunately there is no easy way to do this.

    The only possible workaround right now would be to record the audio of the current tab with getDisplayMedia() but that requires a user interaction and you have to rely on the user to pick the correct tab.

    There was once a meanwhile closed issue for the Web Audio API to enable this.

    There are still two open issues on the Web Speech API repo about this problem. They are about getting the synthesized speech as audio data or as a MediaStreamTrack. Maybe it’s a good idea to add your use case to one of those issues. It’s hopefully resuming the discussion.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search