I have a table which has a large amount of variable data with the key
field in particular that has, at the time of writing 14 variants. The structure is as below:
CREATE TABLE `device_data` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`device_id` bigint(20) unsigned NOT NULL,
`serialized` tinyint(1) NOT NULL DEFAULT 0,
`system` tinyint(1) NOT NULL DEFAULT 0,
`key` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL,
`value` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_device_data_key` (`device_id`,`key`),
CONSTRAINT `device_data_device_id_foreign` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The table holds about 80,000 records which demonstrates that there is a large amount of repetition for the value of the key
field. The problem with this is that if the key value is key_name_here
and it’s repeated 30,000 times, that’s 13 bytes for every row * 30,000.
I understand that I could create an enum column but the system itself is very dynamic and subject to change. It’s not faesable to create a new enum value for the key
every time.
So my question here, would it make sense to add an index to the key
field, in the hope this would reduce overall data consumption and benefit performance or would this simply be detremental to performance?
It’s worth adding that in 99% of cases we’d already be filtering down based on the device_id
column, which makes me suspect this would just be entirely detrimental.
Thanks in advance,
Chris.
2
Answers
Storing the index costs some storage space, but that alone doesn’t hurt performance much. It only takes about 1MB of storage. Unless you’re targeting this database to be stored on a mobile device or IoT device, that’s pretty insignificant storage-wise.
You already have an index that includes
key
, the one created for the UNIQUE KEY constraint. This index would not help a query that searched forkey
alone, but you said in 99% of cases, you search fordevice_id
as well. In that case, the compound index would help reduce the examined rows by both columns. So you don’t need to indexkey
separately.The problem might happen if you occasionally run a query searching for
key
without a condition fordevice_id
. Then it would do a table-scan and search all 80,000 rows, evaluating them row-by-row against your search condition.If you’re willing to bear the performance drag in these cases, then you can avoid creating an additional index on
key
.But it’s not only important how often you run that queries. It might be that when you run that query, even though it’s infrequent, you need it to be optimized better than a table-scan.
Or it could be more important to not have an extra index not for the storage, but because insert/update/delete against this table would have to do three index writes instead of the current two index writes. If you need to optimize for rapid write queries over rapid read queries, that’s up to your application needs.
This is something we can’t answer for you, it’s your judgment call.
(I agree with Bill, and…)
Will you have trillions of devices? If not, use a smaller INT type.
When I see these two statements:
It is often better to have just
because it saves space and avoids a secondary lookup in many cases.
Only 30K times is not that bad. The JOIN (at least in this situation) is likely to be worse for performance.
Yes, an
ENUM
with about 40 options would take only 1 byte. And adding a new ‘option’ is a very cheapALTER TABLE
. But I don’t want to get into the pros and cons of Enums.If you have queries that do
WHERE Key = '...'
withoutAND device_id = ...
, thenINDEX(key)
would be useful.If you need to search
value
, then consider aFULLTEXT
index on just that column.For further advice, I need to see the queries.