Background
I’ve been meaning to implement on-demand transcoding of certain video formats such as ".mkv", ".wmv", ".mov", etc. in order to serve them on a media management server using ASP.NET Core 6.0, C# and ffmpeg.
My Approach
The approach I’ve decided to use is to serve a dynamically generated .m3u8 file which is simply generated using a segment duration of choice e.g. 10s and the known video duration. Here’s how I’ve done it. Note that the resolution is currently not implemented and discarded:
public string GenerateVideoOnDemandPlaylist(double duration, int segment)
{
double interval = (double)segment;
var content = new StringBuilder();
content.AppendLine("#EXTM3U");
content.AppendLine("#EXT-X-VERSION:6");
content.AppendLine(String.Format("#EXT-X-TARGETDURATION:{0}", segment));
content.AppendLine("#EXT-X-MEDIA-SEQUENCE:0");
content.AppendLine("#EXT-X-PLAYLIST-TYPE:VOD");
content.AppendLine("#EXT-X-INDEPENDENT-SEGMENTS");
for (double index = 0; (index * interval) < duration; index++)
{
content.AppendLine(String.Format("#EXTINF:{0:#.000000},", ((duration - (index * interval)) > interval) ? interval : ((duration - (index * interval)))));
content.AppendLine(String.Format("{0:00000}.ts", index));
}
content.AppendLine("#EXT-X-ENDLIST");
return content.ToString();
}
[HttpGet]
[Route("stream/{id}/{resolution}.m3u8")]
public IActionResult Stream(string id, string resolution)
{
double duration = RetrieveVideoLengthInSeconds();
return Content(GenerateVideoOnDemandPlaylist(duration, 10), "application/x-mpegURL", Encoding.UTF8);
}
Here’s an example of how the .m3u8 file looks like:
#EXTM3U
#EXT-X-VERSION:6
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-INDEPENDENT-SEGMENTS
#EXTINF:10.000000,
00000.ts
#EXTINF:3.386667,
00001.ts
#EXT-X-ENDLIST
So the player would ask for 00000.ts, 00001.ts, etc. and the next step is to have them generated on demand:
public byte[] GenerateVideoOnDemandSegment(int index, int duration, string path)
{
int timeout = 30000;
int totalWaitTime = 0;
int waitInterval = 100;
byte[] output = Array.Empty<byte>();
string executable = "/opt/homebrew/bin/ffmpeg";
DirectoryInfo temp = Directory.CreateDirectory(System.IO.Path.Combine(System.IO.Path.GetTempPath(), System.IO.Path.GetRandomFileName()));
string format = System.IO.Path.Combine(temp.FullName, "output-%05d.ts");
using (Process ffmpeg = new())
{
ffmpeg.StartInfo.FileName = executable;
ffmpeg.StartInfo.Arguments = String.Format("-ss {0} ", index * duration);
ffmpeg.StartInfo.Arguments += String.Format("-y -t {0} ", duration);
ffmpeg.StartInfo.Arguments += String.Format("-i "{0}" ", path);
ffmpeg.StartInfo.Arguments += String.Format("-c:v libx264 -c:a aac ");
ffmpeg.StartInfo.Arguments += String.Format("-segment_time {0} -reset_timestamps 1 -break_non_keyframes 1 -map 0 ", duration);
ffmpeg.StartInfo.Arguments += String.Format("-initial_offset {0} ", index * duration);
ffmpeg.StartInfo.Arguments += String.Format("-f segment -segment_format mpegts {0}", format);
ffmpeg.StartInfo.CreateNoWindow = true;
ffmpeg.StartInfo.UseShellExecute = false;
ffmpeg.StartInfo.RedirectStandardError = false;
ffmpeg.StartInfo.RedirectStandardOutput = false;
ffmpeg.Start();
do
{
Thread.Sleep(waitInterval);
totalWaitTime += waitInterval;
}
while ((!ffmpeg.HasExited) && (totalWaitTime < timeout));
if (ffmpeg.HasExited)
{
string filename = System.IO.Path.Combine(temp.FullName, "output-00000.ts");
if (!File.Exists(filename))
{
throw new FileNotFoundException("Unable to find the generated segment: " + filename);
}
output = File.ReadAllBytes(filename);
}
else
{
// It's been too long. Kill it!
ffmpeg.Kill();
}
}
// Remove the temporary directory and all its contents.
temp.Delete(true);
return output;
}
[HttpGet]
[Route("stream/{id}/{index}.ts")]
public IActionResult Segment(string id, int index)
{
string path = RetrieveVideoPath(id);
return File(GenerateVideoOnDemandSegment(index, 10, path), "application/x-mpegURL", true);
}
So as you can see, here’s the command I use to generate each segment incrementing -ss and -initial_offset by 10 for each segment:
ffmpeg -ss 0 -y -t 10 -i "video.mov" -c:v libx264 -c:a aac -segment_time 10 -reset_timestamps 1 -break_non_keyframes 1 -map 0 -initial_offset 0 -f segment -segment_format mpegts /var/folders/8h/3xdhhky96b5bk2w2br6bt8n00000gn/T/4ynrwu0q.z24/output-%05d.ts
The Problem
Things work on a functional level, however the transition between segments is slightly glitchy and especially the audio has very short interruptions at each 10 second mark. How can I ensure the segments are seamless? What can I improve in this process?
2
Answers
Since you’re using the segment muxer, your input duration should be the sum total of the segments you need.
For segments 00002.ts to 00004.ts,
ffmpeg -ss 20 -t 30 -copyts -i "video.mov" -map 0 -c:v libx264 -c:a aac -f segment -segment_time 10 -reset_timestamps 0 -break_non_keyframes 1 -segment_format mpegts output-%05d.ts -y
(you will run just one command for each contiguous set of output segments)
Note: I’d defer to @Gyan on this… he’s the go-to FFmpeg person, but I wanted to give some practical notes and alternatives.
I don’t think you’ll be able to seek into the source video and end up with accurate independent segment transcoding, in practice. Especially if you’re not in control of the files you’re transcoding from, or when they’re in a variety of formats.
Here are a couple on-demand transcoding methods I’ve used instead:
Option A: Transcode only from the beginning, no seeking
This is a good option if your output will be reused for multiple clients. Transcode once, cache the output, and you’re good to go. You can take care of multiple bitrates for ABR in the same FFmpeg process.
The downside of course is that if your client wants video further into the future, it’s going to take some time for the transcoder to get to the segment the client wants.
Option B: Seek the source, for the one client only
In this case, you seek into the source where the client wants and stream back the result, but don’t assume any prior segments are going to line up. Therefore, unless you have several clients making the exact same requests, there’s little point to caching the output.
Also, if it’s helpful, you don’t need to use HLS for this. Since you’re streaming back to a single client and you know what bitrate you want, you can simply send the STDOUT from FFmpeg right to the client for playback. No need to write out segments.
Options A + B
Reality is, you’ll probably need to use both of these methods. If a request comes in for transcoding, you can start streaming from the requested point while enqueuing a background job to transcode the whole file.