I am following this tutorial about hardware-accelerated gpu encoding/decoding for PyTorch [https://pytorch.org/audio/main/hw_acceleration_tutorial.html], I am encountering an error with the following code:
import torch
import torchaudio
print(torch.__version__) # 1.14.0.dev20221013+cu116
print(torchaudio.__version__) # 0.13.0.dev20221013+cu116
print(torchaudio._extension._FFMPEG_INITIALIZED) # True
from torchaudio.io import StreamReader
local_src = "vid.mp4"
cuda_conf = {
"decoder": "h264_cuvid", # Use CUDA HW decoder
"hw_accel": "cuda:0", # Then keep the memory on CUDA:0
}
def decode_vid(src, config):
frames = []
s = StreamReader(src)
s.add_video_stream(5, **config)
for i, (chunk,) in enumerate(s.stream()):
frames.append(chunk[0])
if __name__ == "__main__":
vid = decode_vid(local_src, cuda_conf)
The error message (somewhat truncated) is:
File
"/home/james/PycharmProjects/AlphaPose/Spectronix/Early_Experiments/vid_gpu_decode.py",
line 23, in decode_vid
s.add_video_stream(5, **config) File "/home/james/anaconda3/envs/alphapose/lib/python3.7/site-packages/torchaudio/io/_stream_reader.py",
line 624, in add_video_stream
hw_accel, RuntimeError: Unsupported codec: "h264_cuvid".
I have an RTX 3090 ti as my GPU, which does support the h264_cuvid decoder, and I have been able to decode a video on the command line running (taken from the tutorial linked above)
sudo ffmpeg -hide_banner -y -vsync 0 -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -i "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4" -c:a copy -c:v h264_nvenc -b:v 5M test.mp4
So it seems torchaudio.io is not properly using ffmpeg. Any insights of how to fix this problem much appreciated. I’m using Ubuntu 22.04.
2
Answers
If you are OK stepping away from
torchaudio
(its limitation must be purely due to how the wrapper function works) you can try myffmpegio
package to do the similar function.You can read video frames once (capture all frames till FFmpeg exits) or read a chunk at a time while FFmpeg is running along.
The one thing I’m not sure of your code is
As far as I know, there isn’t a way for an outside program to tap into the CUDA memory space that is mapped to FFmpeg. The
ffmpegio
is not capable of doing this.If you encounter any issues, feel free to post on the GitHub.
RuntimeError: Unsupported codec: "h264_cuvid".
The error happens here, and the StreamReader has not gotten to the point where it executes NVDEC-specific code, so this is generic issue with FFmpeg compatibility.
This suggests that the
libavcodec
found at runtime is not configured withh264_cuvid
.A possible explanation is that there are multiple installations of FFmpeg in your system and torchaudio is picking up the one without NVDEC support, while when you invoke
ffmpeg
command, the one with NVDEC support is loaded.Perhaps you can check your system and see if there are multiple FFmpeg installations and remove the ones without NVDEC support?