We are trying to build a web application. For the sake of simplicity, let’s assume there will be a frontend application (e.g. React app) and a backend application (e.g. Flask). We will collect the speech from the microphone through the frontend application and send it to the backend application in real-time (actually with an acceptable delay). Backend application creates text using some ai models and pushes the text content to the frontend application in real-time. I though that we can use socket-io for transmitting audio in real-time but I couldn’t imagine a well-defined architecture.
Do you know how to build such application ? All ideas are welcome. Thank you for your helps.
2
Answers
I recommend for online speech recognition use remote procedure call (gRPC)
this example may be helpful
I have an example application that records audio in the frontend (vanilla JS), sends it to the back end (Flask), which in turn writes it to a .wav file.
The repository is https://github.com/miguelgrinberg/socketio-examples. There are a few Socket.IO demo apps in there. The one you want is in the
audio
directory.I gave a presentation at a conference where I discuss these demos a few years ago. Here is the section about audio: https://youtu.be/Jwux1TPZUwg?si=SxX8XYvnLV_j0yTV&t=991