skip to Main Content

I made a Telegram robot, and one of its jobs is to create samples from audio files. Now for most audios that is sent to it, the sample is perfectly fine; something like this:

enter image description here

However, for some audios, the sample looks a bit odd:

enter image description here

As you can see, the waves in this file are not shown! (I can assure you that the voice is not empty)

For creating the sample, I use pydub (Thanks, James!). Here’s the part that I create the sample:

song = AudioSegment.from_mp3('song.mp3')
sliced = song[start*1000:end*1000]
sliced.export('song.ogg', format='ogg', parameters=["-acodec", "libopus"])

And then I send the sample using bot.send_voice method. Like this:

bot.send_voice(
    chat_id=update.message.chat.id,
    voice=open('song.ogg', 'rb'),
    caption=settings.caption,
    parse_mode=ParseMode.MARKDOWN,
    timeout=1000
)

The documentation of Telegram Bot API says:

Use this method to send audio files, if you want Telegram clients to
display the file as a playable voice message. For this to work, your
audio must be in an .ogg file encoded with OPUS (other formats may be
sent as Audio or Document).

That’s why in this line of code:

sliced.export('song.ogg', format='ogg', parameters=["-acodec", "libopus"])

I used parameters=["-acodec", "libopus"].

Can anyone tell me what I’m doing wrong? Thanks in advance!

2

Answers


  1. Shot in the dark guess:

    Having just sampled those two Muse songs, “Pressure” is a much louder rock song than “The Void”. I suspect Telegram service itself just detects the music as noise when performing speech to text translation. Unlike speech, which has an wide dynamic range between spoken words, music tends to be all the same volume. Hence, the relative volume of each sample is relatively the same – hence, a flat line.

    Login or Signup to reply.
  2. Since it happen only to some of the songs, I believe the issues is linked with the original song format. Make sure that pudub got file parameters right, e.g.: number of channels, sample width, frame rate, etc. Sometimes the resulting format also changes, so you can get audio in range [-1..1] (float), and sometimes [-32767..32768] (integer).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search