My final objective is to use TTS to get some Indic text converted into audio and pass that audio to a messaging system that accepts mp3 and ogg. Ogg is preferred.
I am on Ubuntu and my flow for getting audio string is something like this.
- Text in Indic language is passed to an API
- API returns a json with a key value called audioContent.
audioString = response.json()['audio'][0]['audioContent']
- The decoded string is arrived by using this
decode_string = base64.b64decode(dat)
I am currently converting it to mp3 and as you can see I am writing the wave file first and then converting it into an mp3.
wav_file = open("output.wav", "wb")
decode_string = base64.b64decode(audioString)
wav_file.write(decode_string)
# Convert this to mp3 file
print('mp3file')
song = AudioSegment.from_wav("output.wav")
song.export("temp.mp3", format="mp3")
Is there a way to convert audioString
directly to ogg file without doing the io?
I’ve tried torchaudio and pyffmpeg to load audioString
and do the conversion but it doesn’t seem to be working.
2
Answers
We may write the WAV data to FFmpeg
stdin
pipe, and read the encoded OGG data from FFmpegstdout
pipe.My following answer describes how to do it with video, and we may apply the same solution to audio.
Piping architecture:
The implementation is equivalent to the following shell command:
cat input.wav | ffmpeg -y -f wav -i pipe: -acodec libopus -f ogg pipe: > test.ogg
According to Wikipedia, common audio codecs for OGG format are Vorbis, Opus, FLAC, and OggPCM (I selected Opus audio codec).
The example uses ffmpeg-python module, but it’s just a binding to FFmpeg sub-process (FFmpeg CLI must be installed, and must be in the execution path).
Execute FFmpeg sub-process with
stdin
pipe as input andstdout
pipe as output:The input format is set to
wav
, the output format is set toogg
and the selected encoder islibopus
.Assuming the audio file is relatively large, we can’t write the entire WAV data at once, because doing so (without "draining"
stdout
pipe) causes the program execution to halt.We may have to write the WAV data (in chunks) in a separate thread, and read the encoded data in the main thread.
Here is a sample for the "writer" thread:
The "writer thread" writes the WAV data in small chucks.
The last chunk is smaller (assume the length is not a multiple of chuck size).
At the end,
stdin
pipe is closed.Closing
stdin
finish encoding the data, and closes FFmpeg sub-process.In the main thread, we are starting the thread, and read encoded "OGG" data from
stdout
pipe (in chunks):For reading the remaining data, we may use
ffmpeg_process.communicate()
:Complete code sample:
You can do this with TorchAudio in the following manner.
Couple of caveats
libsox
(not available on Windows) orffmpeg
(available on Linux/macOS/Windows).torchaudio.save
can encode OPUS format usinglibsox
. However, underlying implementation onlibsox
is buggy so it is not recommended to usetorchaudio.save
for OPUS.StreamWriter
fromtorchaudio.io
, which is available as of v0.13. (You need to installffmpeg>=4.1,<5
)