MediaWiki (the free software behind Wikipedia) stores database timestamps in a unique binary(14) format for fields of the database. This is described further in their timestamp documentation.
The format of timestamps used in MediaWiki URLs and in some of the
MediaWiki database fields is yyyymmddhhmmss. For example, the
timestamp for 2023-01-20 17:12:22 (UTC) is 20230120171222. The
timezone for these timestamps is UTC.
I have also seen a similar timestamp format in other places such as URLs for the Internet Archive. I am regularly needing to compare these timestamps against timestamps which are stored in a standard Unix timestamp format (seconds from the Unix epoch). I believe this should be a common format so it surprises me that I can’t find a ready-made solution to easily convert from the MediaWiki format to a Unix timestamp.
What I’m most interested in is the best way to do this conversion. That is:
- Relatively short/simple to understand code.
- Most efficient algorithm.
- Does detect errors in original format.
There is apparently a function that MediaWiki includes for conversion named "wfTimestamp" however I haven’t been able to locate this function itself or the source code online and I understand it has a large number of unnecessary features beyond the simple conversion. One potential solution may be to remove other parts of that function, but I still don’t know if that function is the optimal solution or if there’s a better way. There are lots of questions on the more general conversion to timestamps but I’m hoping for something specific to this format. I’ve thought of a lot of ways to solve it such as a regular expression, mktime after string split, strtotime, etc… but I’m not sure which will be fastest for this particular task/time format if it had to be done a lot of times. I am assuming since this format exists in at least two places, an optimal solution for this specific format conversion could be useful for others as well. Thanks.
3
Answers
You can use
DateTime::createFromFormat
function with specified format.I’m not sure that you can find more optimised way, because even if you will parse this manually, you have to consider that there are leap years and not every day has exactly 24 hours. PHP does it for you.
I think this is what you’re probably looking:
The Unix timestamp that this function returns does not contain information about time zones. In order to do calculations with date/time information, you should use the more capable DateTimeImmutable.
Please see here: https://www.php.net/manual/en/function.strtotime.php
In order to interpret the string "20230120171222" as UTC time, the time zone must be specified with strtotime or the default time zone must be set to UTC.
See this example for comparison.