skip to Main Content

I needed to extract the video Id and the start time from any kind of youtube url that the users can input. I have a working solution but it is not right.

Questions:

  • Could someone help me to fix the preg_match pattern to handle the urls commented in the tests?
  • Is there any other kind of youtube url?

I have checked this stackoverflow page to build my own youtube url parser.

This preg_match can extract the video Id and the start time but cannot handle the many different youtube url formats:

preg_match("/[a-zA-Z//:.]*youtu(?:be.com/watch?v=|.be/)([a-zA-Z0-9-_]+)(?:[&?/]t=)?(d*)(?:[a-zA-Z0-9/*-_?&;%=.]*)/i", $url, $matches);

This preg_match handles many different youtube urls (maybe all kind of?) but doesn’t extract the start time:

preg_match("/^(?:http(?:s)?://)?(?:www.)?(?:m.)?(?:youtu.be/|youtube.com/(?:(?:watch)??(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)/))([^?&"'>]+)/", $url, $matches);

I have changed it and it works for me, but I know that my change is not right because I don’t parse the end of the url properly:

preg_match("/^(?:http(?:s)?://)?(?:www.)?(?:m.)?(?:youtu.be/|youtube.com/(?:(?:watch)??(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)/))([^?&"'>]+)(?:[&?/]t=)?(d*)/", $url, $matches);

The code

<?php
declare(strict_types=1);

namespace AppBundleValue;

class YoutubeVideoData
{
    private function __construct(public ?string $videoId = null, public ?int $time = null)
    {
    }

    public static function fromUrl(string $url): self
    {
        // `#action=share` is not supported
        preg_match("/^(?:http(?:s)?://)?(?:www.)?(?:m.)?(?:youtu.be/|youtube.com/(?:(?:watch)??(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)/))([^?&"'>]+)(?:[&?/]t=)?(d*)/", $url, $matches);

        $videoId = null;
        if (isset($matches[1])) {
            $videoId = $matches[1];
        }

        $time = null;
        if (isset($matches[2]) && $matches[2] !== "") {
            $time = (int) $matches[2];
        }

        return new self($videoId, $time);
    }

}

The tests:

<?php

namespace JustimmoTestsValue;

use AppBundleValueYoutubeVideoData;
use PHPUnitFrameworkAttributesDataProvider;
use PHPUnitFrameworkTestCase;

/**
 * @covers AppBundleValueYoutubeVideoData::class
 */
class YoutubeVideoDataTest extends TestCase
{
    #[DataProvider('urlProvider')]
    public function testUrls(string $url, ?string $expectedVideoId, ?int $expectedTime)
    {
        $videoData = YoutubeVideoData::fromUrl($url);

        $this->assertSame($expectedVideoId, $videoData->videoId);
        $this->assertSame($expectedTime, $videoData->time);
    }

    public static function urlProvider(): iterable
    {
        // without https://www
        yield 'youtube_link_1' => ['youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_2' => ['youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_3' => ['youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_4' => ['youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_5' => ['youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_6' => ['youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_7' => ['youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_8' => ['youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_9' => ['youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_10' => ['youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_11' => ['m.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // without https://
        yield 'youtube_link_12' => ['www.youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_13' => ['www.youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_14' => ['www.youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_15' => ['www.youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_16' => ['www.youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_17' => ['www.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_18' => ['www.youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_19' => ['www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_20' => ['www.youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_21' => ['www.youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // http
        yield 'youtube_link_22' => ['http://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_23' => ['http://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_24' => ['http://youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_25' => ['http://www.youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_26' => ['http://www.youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_27' => ['http://www.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_28' => ['http://www.youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_29' => ['http://www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_30' => ['http://youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_31' => ['http://www.youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_32' => ['http://m.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // https
        yield 'youtube_link_33' => ['https://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_34' => ['https://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_35' => ['https://youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_36' => ['https://www.youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_37' => ['https://www.youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_38' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_39' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        yield 'youtube_link_40' => ['https://www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_41' => ['https://youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_42' => ['https://www.youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
        yield 'youtube_link_43' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];

        // with start time
        yield 'youtube_link_44' => ['https://youtube.com/v/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_45' => ['https://youtube.com/v/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_46' => ['https://youtube.com/vi/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_47' => ['https://www.youtube.com/?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_48' => ['https://www.youtube.com/?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_49' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_50' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        yield 'youtube_link_51' => ['https://www.youtu.be/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_52' => ['https://youtube.com/embed/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_53' => ['https://www.youtube.com/shorts/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
        yield 'youtube_link_54' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];

        // with feature
        yield 'youtube_link_55' => ['https://www.youtube.com/watch?dev=inprogress&v=7HCZvhRAk-M&feature=related', '7HCZvhRAk-M', null];

        yield 'youtube_link_56' => ['https://youtube.com/v/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_57' => ['https://youtube.com/v/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_58' => ['https://youtube.com/vi/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_59' => ['https://www.youtube.com/?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_60' => ['https://www.youtube.com/?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_61' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_62' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        yield 'youtube_link_63' => ['https://www.youtu.be/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_64' => ['https://youtube.com/embed/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_65' => ['https://www.youtube.com/shorts/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
        yield 'youtube_link_66' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];

        // do not implemented yet - with #action=share
//        yield 'youtube_link_67' => ['https://youtube.com/v/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
//        yield 'youtube_link_68' => ['https://youtube.com/v/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//        yield 'youtube_link_69' => ['https://youtube.com/vi/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
//        yield 'youtube_link_70' => ['https://www.youtube.com/?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//        yield 'youtube_link_71' => ['https://www.youtube.com/?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
//        yield 'youtube_link_72' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//        yield 'youtube_link_73' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
//        yield 'youtube_link_74' => ['https://www.youtu.be/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//        yield 'youtube_link_75' => ['https://youtube.com/embed/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//        yield 'youtube_link_76' => ['https://www.youtube.com/shorts/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//        yield 'youtube_link_77' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
    }

}

2

Answers


  1. I don’t really like long and complex regular expressions. They are difficult to understand and too much can go wrong. Why not a slightly different approach?

    Suppose we take an youTube URL like this:

    https://www.youtube.com/watch?v=Og40mpl8VNc
    

    The video id is Og40mpl8VNc. This is a base64 encoded number. It’s the only thing in the URL that is base64 encoded. All the other parts, like https, www, youtube, com, etc, are not valid base64 encoded strings. Perhaps we can use this?

    A simple way to check if something is a valid base64 string is to decode and re-encode it. That shouldn’t change the string. Only Og40mpl8VNc can be decoded and re-encoded without changing it.

    We can apply this check to all parts of all the your Youtube video URLs, using your urlProvider() method:

    $youtubeId = [];
    foreach (urlProvider() as $data) {
        foreach (array_reverse(preg_split('/[/?=#&]/', $data[0])) as $part) {
           if (trim(base64_encode(base64_decode($part, true)) , '=') == $part) {
              $youtubeId[$data[0]] = $part; 
              break;
           }
        }
    }
    

    Live demo: https://3v4l.org/KKuRl

    I split the URL on these five characters: /?=#&, to get all the parts. Then reverse those, because the video id is often at the end, and then walk all the parts looking for a valid base64 string.

    Note that I also remove any base64 = padding.

    Now admittedly, I haven’t thoroughly tested this. It’s just an idea. There is obviously a very tiny risk of false positives, but I hope this risk is negligible.

    Now I didn’t do the time, because I think you can easily get at that since it is always a parameter. I also didn’t check whether the URL, as a whole, is valid, but I don’t think that’s really what you want to know.

    Login or Signup to reply.
  2. Instead of using a regex, you could make use of PHP’s parse_url and parse_str methods.

    $urlParts = parse_url($url);
    $queryParams = [];
    parse_str($urlParts['query'], $queryParams);
    
    $videoId =
        $queryParams['vi'] ?: (
            $queryParams['v'] ?: (
                basename($urlParts['path'])
            )
        );
    
    $time =
        isset($queryParams['t']) ? (int) $queryParams['t'] :
        null;
    

    I haven’t tried it against all your test cases, but it seems to be a more robust approach.

    You could even combine this with KIKO Software’s answer (https://stackoverflow.com/a/79358821/20418616) if you want to have more confidence in the resulting $videoId value.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search