I’m using Microsoft Azure’s Text-to-speech API with a simple goal: play the synthesized speech on browser when I click a button.
I am using nextJS API routes to make a request to Azure, and then use this route from the client-side button to play the audio.
blob:http://localhost:3000/aab03e2a-14c1-48a7-9dae-4eac158325a5:1
GET blob:http://localhost:3000/aab03e2a-14c1-48a7-9dae-4eac158325a5
net::ERR_REQUEST_RANGE_NOT_SATISFIABLE
localhost/:1 Uncaught (in promise) DOMException: Failed to load because no supported source was found.
/pages/api/synthesizeSpeech.tsx
import { NextApiRequest, NextApiResponse } from "next";
import * as sdk from "microsoft-cognitiveservices-speech-sdk";
export default async (req: NextApiRequest, res: NextApiResponse) => {
if (req.method !== "POST") {
return res.status(405).end();
}
const speechConfig = sdk.SpeechConfig.fromSubscription(process.env.SPEECH_KEY, process.env.SPEECH_REGION);
speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural";
// Create a pull stream
const pullStream = sdk.AudioOutputStream.createPullStream();
const audioConfig = sdk.AudioConfig.fromStreamOutput(pullStream);
const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
const text = req.body.text;
synthesizer.speakTextAsync(
text,
(result) => {
if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
// Set the appropriate headers for audio data
res.setHeader("Content-Type", "audio/wav");
res.setHeader("Content-Disposition", "attachment; filename=speech.wav");
// Read the audio data from the pull stream and write it to the response
const audioBuffer = [];
const bufferSize = 10240;
const buffer = new ArrayBuffer(bufferSize);
let bytesRead = 0;
do {
// @ts-ignore
bytesRead = pullStream.read(buffer);
for (let i = 0; i < bytesRead; i++) {
// @ts-ignore
audioBuffer.push(buffer[i]);
}
} while (bytesRead > 0);
res.status(200).end(Buffer.from(audioBuffer));
} else {
res.status(500).json({
error: `Speech synthesis canceled, ${result.errorDetails}nDid you set the speech resource key and region values?`,
});
}
synthesizer.close();
},
(err) => {
res.status(500).json({ error: `Error - ${err}` });
synthesizer.close();
}
);
};
pages/demo.tsx
const ButtonPanel = () => {
const handleSynthesize = async (text: string) => {
alert(text);
try {
const response = await fetch("/api/synthesizeSpeech", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ text }),
});
if (!response.ok) {
throw new Error("Failed to synthesize speech");
}
const blob = await response.blob();
const audioUrl = URL.createObjectURL(blob);
const audio = new Audio(audioUrl);
audio.play();
} catch (error) {
console.error(error);
}
};
return (
<footer className="m-4 mt-0 w-[calc(100vw-2rem)] rounded-b-lg border-t-2 border-gray-200 bg-white shadow-lg">
<div className="flex items-center justify-center space-x-4 p-4">
<button
onClick={() => {
handleSynthesize("hello this is a test test hello");
}}
className="btn-solid w-32 disabled:cursor-not-allowed disabled:bg-gray-100"
>
<FaCircleArrowUp size={28} />
</button>
</div>
</footer>
);
2
Answers
I was able to solve with this code, although it doesn't support async audio streaming
I made some changes to your code and can able here the audio output with input text in the browser.
Code:
synthesizeSpeech.ts:
ButtonPanel.tsx:
index.tsx:
Output:
It runs successfully as below,
With the above output URL, I got below in the browser. Then, click on Synthesize and Play, and I can hear the audio output.