skip to Main Content

We are trying to build a web application. For the sake of simplicity, let’s assume there will be a frontend application (e.g. React app) and a backend application (e.g. Flask). We will collect the speech from the microphone through the frontend application and send it to the backend application in real-time (actually with an acceptable delay). Backend application creates text using some ai models and pushes the text content to the frontend application in real-time. I though that we can use socket-io for transmitting audio in real-time but I couldn’t imagine a well-defined architecture.

Do you know how to build such application ? All ideas are welcome. Thank you for your helps.

2

Answers


  1. I recommend for online speech recognition use remote procedure call (gRPC)
    this example may be helpful

    Login or Signup to reply.
  2. I have an example application that records audio in the frontend (vanilla JS), sends it to the back end (Flask), which in turn writes it to a .wav file.

    The repository is https://github.com/miguelgrinberg/socketio-examples. There are a few Socket.IO demo apps in there. The one you want is in the audio directory.

    I gave a presentation at a conference where I discuss these demos a few years ago. Here is the section about audio: https://youtu.be/Jwux1TPZUwg?si=SxX8XYvnLV_j0yTV&t=991

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search