I have a datetime column in a table I wanted to use as a key, very frequently, in a website I am working with.
As it’s going to be transmitted back and forth very frequently I wanted to reduce as much as I can the data transfer overhead without hurting performance and also keeping the data storage (in cookies, in the end) reliable.
To this end, I’m thinking on to_base64(unix_timestamp(date_column))
.
Then I can convert the date column to a base64 based on its timestamp value. For example, if I have a datetime of 2024-11-02 07:25:16
, its unix_timestamp()
would be 1730528716
and its to_base64()
, MTczMDUyODcxNg==
.
The problem now is that I don’t think it’s fair to go over all the trouble to get an int(11)
value (at least until around mid-jan 2038) just to pack it as a string of 0-9’s while I could be converting its actual integer representation.
From MySQL itself I can get the value I want if I "hack" the column with to_base64(unhex(hex(unix_timestamp())))
:
> select date_column, unix_timestamp(date_column), to_base64(unix_timestamp(date_column)), to_base64(unhex(hex(unix_timestamp(date_column))));
+---------------------+-----------------------------+----------------------------------------+----------------------------------------------------+
| date_column | unix_timestamp(date_column) | to_base64(unix_timestamp(date_column)) | to_base64(unhex(hex(unix_timestamp(date_column)))) |
+---------------------+-----------------------------+----------------------------------------+----------------------------------------------------+
| 2024-11-02 07:25:16 | 1730528716 | MTczMDUyODcxNg== | ZyXFzA== |
+---------------------+-----------------------------+----------------------------------------+----------------------------------------------------+
1 row in set (0.000 sec)
So in one hand, I’m saving a few bytes every client-server communication in both http server and mysql and actual client browser and http server and also base64 computation. But with the hack above I super doubt I’ll be really saving anything in the end… while I could just be returning the integer value as a raw integer or doing a (hopefully) quick cast.
Some other attempts I made:
to_base64(convert(unix_timestamp(date_column) using binary))
to_base64(cast(unix_timestamp(date_column) as binary))
to_base64(cast(unix_timestamp(date_column) as int))
MTczMDUyODcxNg== (wrong)
I have noticed this behavior also in generic values, like:
select 10, to_base64(10), to_base64('10'), to_base64(unhex(hex(10)))
10 | MTA= | MTA= | Cg==
(in which example I’m not saving a character FWIW)
So, is there any inexpensive way I could get that base64 "shorter" value out of the unix_timestamp()
result?
As this is going to be used in a critical part of the application, I’m shooting the fewer characters transmission and computation will be worth the trouble as long as the "casting" from the number sequence into actual numeric representation is cheap.
I may consider storing the date as a unix timestamp (not datetime
but an integer) if that’s best for this case, in which I could also benefit from 64-bit integer storage.
Update
Thanks to the great explanation and discussion with Bill Karwin, I delved a bit more on this and, according to the documentation page for to_base64()
, "If the argument is not a string, it is converted to a string before conversion takes place.".
I also noticed basides the unhex(hex())
hack, binary columns won’t be converted. The common factor here seems to be, I need to remove the integer’s ability to be converted to string. cast, convert,
Actual cast(value as binary)
and convert(value using binary)
didn’t help. So to this point the "alternative solution" I found would be to store the value in a binary column and select it — in which case I believe the unhex(hex()
hack still wins.
There must be a cheap way to get rid of the ability to convert an integer to its string representation just like a binary type behaves.
2
Answers
First an explanation of why these operations cause the payload to get larger, and how your "hack" helps.
to_base64()
treats its argument as a string, so your timestamp value 1730528716 implicitly becomes the 10-character string ‘1730528716’.Converting an integer to a string always expands the input, because each decimal digit represents about 3 bits of information, but each character in an ASCII string uses 8 bits.
Base64 converts any string of three 8-bit characters into four 6-bit digits. But the digit characters still take 8 bits each in ASCII. Thus base64 always expands its string input by 33%.
This is why the base64 encoding of a 10-character string is 14 bytes.
The "hack" you did was to
unhex(hex())
, which converts the timestamp value into four binary bytes instead of 10 characters. Binary bytes appears to be a "string" for purposes of input toto_base64()
.This means it can skip the conversion to string, and apply base64 to four bytes instead of ten bytes.
You ask if you can do the
unhex(hex())
faster. How fast do you need it to be exactly? Those are already inexpensive conversions. On my laptop, I execute that operation 1 million times and it still takes less than 0.07 seconds:Another idea for reducing the payload is to store less than a full timestamp. Instead of storing the timestamp in granularity of seconds, you could reduce it to granularity of hours. But this may not be suitable for your uses of the timestamp.
Also the timestamp measures seconds since the epoch 1970-01-01. If you need timestamps in seconds, but you’re not interested in timestamps that far back, you could only count seconds since, for example, 2010. That reduces the timestamp to 9 digits instead of 10 digits
Do you know of any computers or software from the year 2010 that are still running intact? That is as long ago as 2038 is in the future. Whatever you do today will be long gone before 2038; spend your time worrying about other things now.
Consider the slowdown of the extra function calls versus network transmission savings. (Not enough info for us to get into details here.)
A
SELECT
delivers only strings. In the case ofBLOB
, the data must be expanded to allow for transmission protocols.I have a rule of thumb: "If an optimization does not save at least 10%, don’t bother; instead, look for something more productive to optimize." In the grand scheme of your API, it sounds like the savings you are trying to get are much less than 10%.
For obscuring, see
AES_ENCRYPT
andDES_ENCRYPT
, but don’t expect space saving since the encryption is a multiple of 16 bytes: