skip to Main Content

I have a table which has a large amount of variable data with the key field in particular that has, at the time of writing 14 variants. The structure is as below:

CREATE TABLE `device_data` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `device_id` bigint(20) unsigned NOT NULL,
  `serialized` tinyint(1) NOT NULL DEFAULT 0,
  `system` tinyint(1) NOT NULL DEFAULT 0,
  `key` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_device_data_key` (`device_id`,`key`),
  CONSTRAINT `device_data_device_id_foreign` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

The table holds about 80,000 records which demonstrates that there is a large amount of repetition for the value of the key field. The problem with this is that if the key value is key_name_here and it’s repeated 30,000 times, that’s 13 bytes for every row * 30,000.

I understand that I could create an enum column but the system itself is very dynamic and subject to change. It’s not faesable to create a new enum value for the key every time.

So my question here, would it make sense to add an index to the key field, in the hope this would reduce overall data consumption and benefit performance or would this simply be detremental to performance?

It’s worth adding that in 99% of cases we’d already be filtering down based on the device_id column, which makes me suspect this would just be entirely detrimental.

Thanks in advance,
Chris.

2

Answers


  1. Storing the index costs some storage space, but that alone doesn’t hurt performance much. It only takes about 1MB of storage. Unless you’re targeting this database to be stored on a mobile device or IoT device, that’s pretty insignificant storage-wise.

    You already have an index that includes key, the one created for the UNIQUE KEY constraint. This index would not help a query that searched for key alone, but you said in 99% of cases, you search for device_id as well. In that case, the compound index would help reduce the examined rows by both columns. So you don’t need to index key separately.

    The problem might happen if you occasionally run a query searching for key without a condition for device_id. Then it would do a table-scan and search all 80,000 rows, evaluating them row-by-row against your search condition.

    If you’re willing to bear the performance drag in these cases, then you can avoid creating an additional index on key.

    But it’s not only important how often you run that queries. It might be that when you run that query, even though it’s infrequent, you need it to be optimized better than a table-scan.

    Or it could be more important to not have an extra index not for the storage, but because insert/update/delete against this table would have to do three index writes instead of the current two index writes. If you need to optimize for rapid write queries over rapid read queries, that’s up to your application needs.

    This is something we can’t answer for you, it’s your judgment call.

    Login or Signup to reply.
  2. (I agree with Bill, and…)

    device_id` bigint(20) unsigned NOT NULL,
    

    Will you have trillions of devices? If not, use a smaller INT type.

    When I see these two statements:

    PRIMARY KEY (`id`),
    UNIQUE KEY `unique_device_data_key` (device_id, key),
    

    It is often better to have just

    PRIMARY KEY `unique_device_data_key` (device_id, key),
    

    because it saves space and avoids a secondary lookup in many cases.

    key value is key_name_here and it’s repeated 30,000 times

    Only 30K times is not that bad. The JOIN (at least in this situation) is likely to be worse for performance.

    Yes, an ENUM with about 40 options would take only 1 byte. And adding a new ‘option’ is a very cheap ALTER TABLE. But I don’t want to get into the pros and cons of Enums.

    If you have queries that do WHERE Key = '...' without AND device_id = ..., then INDEX(key) would be useful.

    If you need to search value, then consider a FULLTEXT index on just that column.

    For further advice, I need to see the queries.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search