I needed to extract the video Id
and the start time
from any kind of youtube url that the users can input. I have a working solution but it is not right.
Questions:
- Could someone help me to fix the preg_match pattern to handle the urls commented in the tests?
- Is there any other kind of youtube url?
I have checked this stackoverflow page to build my own youtube url parser.
This preg_match can extract the video Id
and the start time
but cannot handle the many different youtube url formats:
preg_match("/[a-zA-Z//:.]*youtu(?:be.com/watch?v=|.be/)([a-zA-Z0-9-_]+)(?:[&?/]t=)?(d*)(?:[a-zA-Z0-9/*-_?&;%=.]*)/i", $url, $matches);
This preg_match handles many different youtube urls (maybe all kind of?) but doesn’t extract the start time:
preg_match("/^(?:http(?:s)?://)?(?:www.)?(?:m.)?(?:youtu.be/|youtube.com/(?:(?:watch)??(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)/))([^?&"'>]+)/", $url, $matches);
I have changed it and it works for me, but I know that my change is not right because I don’t parse the end of the url properly:
preg_match("/^(?:http(?:s)?://)?(?:www.)?(?:m.)?(?:youtu.be/|youtube.com/(?:(?:watch)??(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)/))([^?&"'>]+)(?:[&?/]t=)?(d*)/", $url, $matches);
The code
<?php
declare(strict_types=1);
namespace AppBundleValue;
class YoutubeVideoData
{
private function __construct(public ?string $videoId = null, public ?int $time = null)
{
}
public static function fromUrl(string $url): self
{
// `#action=share` is not supported
preg_match("/^(?:http(?:s)?://)?(?:www.)?(?:m.)?(?:youtu.be/|youtube.com/(?:(?:watch)??(?:.*&)?v(?:i)?=|(?:embed|v|vi|user|shorts)/))([^?&"'>]+)(?:[&?/]t=)?(d*)/", $url, $matches);
$videoId = null;
if (isset($matches[1])) {
$videoId = $matches[1];
}
$time = null;
if (isset($matches[2]) && $matches[2] !== "") {
$time = (int) $matches[2];
}
return new self($videoId, $time);
}
}
The tests:
<?php
namespace JustimmoTestsValue;
use AppBundleValueYoutubeVideoData;
use PHPUnitFrameworkAttributesDataProvider;
use PHPUnitFrameworkTestCase;
/**
* @covers AppBundleValueYoutubeVideoData::class
*/
class YoutubeVideoDataTest extends TestCase
{
#[DataProvider('urlProvider')]
public function testUrls(string $url, ?string $expectedVideoId, ?int $expectedTime)
{
$videoData = YoutubeVideoData::fromUrl($url);
$this->assertSame($expectedVideoId, $videoData->videoId);
$this->assertSame($expectedTime, $videoData->time);
}
public static function urlProvider(): iterable
{
// without https://www
yield 'youtube_link_1' => ['youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_2' => ['youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_3' => ['youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_4' => ['youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_5' => ['youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_6' => ['youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_7' => ['youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_8' => ['youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_9' => ['youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_10' => ['youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_11' => ['m.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
// without https://
yield 'youtube_link_12' => ['www.youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_13' => ['www.youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_14' => ['www.youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_15' => ['www.youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_16' => ['www.youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_17' => ['www.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_18' => ['www.youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_19' => ['www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_20' => ['www.youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_21' => ['www.youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
// http
yield 'youtube_link_22' => ['http://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_23' => ['http://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_24' => ['http://youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_25' => ['http://www.youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_26' => ['http://www.youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_27' => ['http://www.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_28' => ['http://www.youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_29' => ['http://www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_30' => ['http://youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_31' => ['http://www.youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_32' => ['http://m.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
// https
yield 'youtube_link_33' => ['https://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_34' => ['https://youtube.com/v/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_35' => ['https://youtube.com/vi/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_36' => ['https://www.youtube.com/?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_37' => ['https://www.youtube.com/?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_38' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_39' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_40' => ['https://www.youtu.be/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_41' => ['https://youtube.com/embed/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_42' => ['https://www.youtube.com/shorts/dE5jPNvLvOk', 'dE5jPNvLvOk', null];
yield 'youtube_link_43' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk', 'dE5jPNvLvOk', null];
// with start time
yield 'youtube_link_44' => ['https://youtube.com/v/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_45' => ['https://youtube.com/v/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_46' => ['https://youtube.com/vi/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_47' => ['https://www.youtube.com/?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_48' => ['https://www.youtube.com/?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_49' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_50' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_51' => ['https://www.youtu.be/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_52' => ['https://youtube.com/embed/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_53' => ['https://www.youtube.com/shorts/dE5jPNvLvOk?t=30', 'dE5jPNvLvOk', 30];
yield 'youtube_link_54' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk&t=30', 'dE5jPNvLvOk', 30];
// with feature
yield 'youtube_link_55' => ['https://www.youtube.com/watch?dev=inprogress&v=7HCZvhRAk-M&feature=related', '7HCZvhRAk-M', null];
yield 'youtube_link_56' => ['https://youtube.com/v/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_57' => ['https://youtube.com/v/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_58' => ['https://youtube.com/vi/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_59' => ['https://www.youtube.com/?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_60' => ['https://www.youtube.com/?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_61' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_62' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_63' => ['https://www.youtu.be/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_64' => ['https://youtube.com/embed/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_65' => ['https://www.youtube.com/shorts/dE5jPNvLvOk?feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
yield 'youtube_link_66' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk&feature=youtube_gdata_player', 'dE5jPNvLvOk', null];
// do not implemented yet - with #action=share
// yield 'youtube_link_67' => ['https://youtube.com/v/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
// yield 'youtube_link_68' => ['https://youtube.com/v/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
// yield 'youtube_link_69' => ['https://youtube.com/vi/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
// yield 'youtube_link_70' => ['https://www.youtube.com/?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
// yield 'youtube_link_71' => ['https://www.youtube.com/?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
// yield 'youtube_link_72' => ['https://www.youtube.com/watch?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
// yield 'youtube_link_73' => ['https://www.youtube.com/watch?vi=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
//
// yield 'youtube_link_74' => ['https://www.youtu.be/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
// yield 'youtube_link_75' => ['https://youtube.com/embed/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
// yield 'youtube_link_76' => ['https://www.youtube.com/shorts/dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
// yield 'youtube_link_77' => ['https://m.youtube.com/watch?v=dE5jPNvLvOk#action=share ', 'dE5jPNvLvOk', null];
}
}
2
Answers
I don’t really like long and complex regular expressions. They are difficult to understand and too much can go wrong. Why not a slightly different approach?
Suppose we take an youTube URL like this:
The video id is
Og40mpl8VNc
. This is a base64 encoded number. It’s the only thing in the URL that is base64 encoded. All the other parts, likehttps
,www
,youtube
,com
, etc, are not valid base64 encoded strings. Perhaps we can use this?A simple way to check if something is a valid base64 string is to decode and re-encode it. That shouldn’t change the string. Only
Og40mpl8VNc
can be decoded and re-encoded without changing it.We can apply this check to all parts of all the your Youtube video URLs, using your
urlProvider()
method:Live demo: https://3v4l.org/KKuRl
I split the URL on these five characters:
/?=#&
, to get all the parts. Then reverse those, because the video id is often at the end, and then walk all the parts looking for a valid base64 string.Note that I also remove any base64
=
padding.Now admittedly, I haven’t thoroughly tested this. It’s just an idea. There is obviously a very tiny risk of false positives, but I hope this risk is negligible.
Now I didn’t do the time, because I think you can easily get at that since it is always a parameter. I also didn’t check whether the URL, as a whole, is valid, but I don’t think that’s really what you want to know.
Instead of using a regex, you could make use of PHP’s
parse_url
andparse_str
methods.I haven’t tried it against all your test cases, but it seems to be a more robust approach.
You could even combine this with KIKO Software’s answer (https://stackoverflow.com/a/79358821/20418616) if you want to have more confidence in the resulting
$videoId
value.