skip to Main Content

I am following this tutorial about hardware-accelerated gpu encoding/decoding for PyTorch [https://pytorch.org/audio/main/hw_acceleration_tutorial.html], I am encountering an error with the following code:

import torch
import torchaudio

print(torch.__version__) # 1.14.0.dev20221013+cu116
print(torchaudio.__version__) # 0.13.0.dev20221013+cu116
print(torchaudio._extension._FFMPEG_INITIALIZED) # True

from torchaudio.io import StreamReader
local_src = "vid.mp4"
cuda_conf = {
    "decoder": "h264_cuvid",  # Use CUDA HW decoder
    "hw_accel": "cuda:0",  # Then keep the memory on CUDA:0
}

def decode_vid(src, config):
    frames = []
    s = StreamReader(src)
    s.add_video_stream(5, **config)
    for i, (chunk,) in enumerate(s.stream()):
        frames.append(chunk[0])

if __name__ == "__main__":
    vid = decode_vid(local_src, cuda_conf)

The error message (somewhat truncated) is:

File
"/home/james/PycharmProjects/AlphaPose/Spectronix/Early_Experiments/vid_gpu_decode.py",
line 23, in decode_vid
s.add_video_stream(5, **config) File "/home/james/anaconda3/envs/alphapose/lib/python3.7/site-packages/torchaudio/io/_stream_reader.py",
line 624, in add_video_stream
hw_accel, RuntimeError: Unsupported codec: "h264_cuvid".

I have an RTX 3090 ti as my GPU, which does support the h264_cuvid decoder, and I have been able to decode a video on the command line running (taken from the tutorial linked above)

sudo ffmpeg -hide_banner -y -vsync 0 -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -i "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4" -c:a copy -c:v h264_nvenc -b:v 5M test.mp4

So it seems torchaudio.io is not properly using ffmpeg. Any insights of how to fix this problem much appreciated. I’m using Ubuntu 22.04.

2

Answers


  1. If you are OK stepping away from torchaudio (its limitation must be purely due to how the wrapper function works) you can try my ffmpegio package to do the similar function.

    pip install ffmpegio
    

    You can read video frames once (capture all frames till FFmpeg exits) or read a chunk at a time while FFmpeg is running along.

    
    from ffmpegio
    
    local_src = "vid.mp4"
    
    kwargs = {
      'vsync': 0,
      'hwaccel_in': 'cuvid' # pick one or 
    #  'c:v_in': h264_cuvid  # the other
    }
    
    
    # to read all the frames RGB 
    fs, F = ffmpegio.video.read(local_src, **kwargs)
    # fs: framerate in frames/sec
    # F: nframes x height x width x ncomp  numpy array
    
    # read n frames
    n = 10
    fs, F = ffmpegio.video.read(local_src, vframes=n, **kwargs)
    # F: n x height x width x ncomp  numpy array
    
    # work on n frames at a time
    with ffmpegio.open(local_src, blocksize=n, **kwargs) as f:
        for F in f:
            # F: n x height x width x ncomp  numpy array
    

    The one thing I’m not sure of your code is

        "hw_accel": "cuda:0",  # Then keep the memory on CUDA:0
    

    As far as I know, there isn’t a way for an outside program to tap into the CUDA memory space that is mapped to FFmpeg. The ffmpegio is not capable of doing this.

    If you encounter any issues, feel free to post on the GitHub.

    Login or Signup to reply.
  2. RuntimeError: Unsupported codec: "h264_cuvid".

    The error happens here, and the StreamReader has not gotten to the point where it executes NVDEC-specific code, so this is generic issue with FFmpeg compatibility.

    This suggests that the libavcodec found at runtime is not configured with h264_cuvid.

    A possible explanation is that there are multiple installations of FFmpeg in your system and torchaudio is picking up the one without NVDEC support, while when you invoke ffmpeg command, the one with NVDEC support is loaded.

    Perhaps you can check your system and see if there are multiple FFmpeg installations and remove the ones without NVDEC support?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search