skip to Main Content

With the following table:

CREATE TABLE `Example` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `properties` json DEFAULT NULL,
  `hash` binary(20) GENERATED ALWAYS AS (unhex(sha(`properties`))) STORED,
  PRIMARY KEY (`id`),
  KEY `hash` (`hash`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

the column hash is derived from the column properties. In the nomenclature in the docs, {properties} -> {hash}

Also from the docs concerning handling of GROUP BY, comes the following:

SQL:1999 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. This would be the case, for example, were custid a primary key of customers

However, despite this, the following query returns an error (no data is required in the table to reproduce this):

SELECT `properties` from `Example` GROUP BY `hash`;

The error is

Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘dispatch.Example.properties’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

The error says the column is NOT functionally dependent. This is probably because the query analyzer doesn’t assume that the value returned by the SHA function is deterministic. Maybe? Is the possibility of a hash collision killing this whole idea?

The hash is considerably less useful if I still have to apply an aggregate function to the properties column to populate the result. Is there a way I can assert to MySQL that there is functional dependence?

Failing that, what’s the best way to say "just give me the property from any matching row" that doesn’t involve comparing properties records (which is the point of the hash)? The best I’ve come up with is FIRST in a window function, but that feels janky.

2

Answers


  1. Chosen as BEST ANSWER

    It was the direction of the dependency that I had wrong; with the above example

    Select `hash` FROM `Example` GROUP BY `properties`
    

    works correctly, since hash depends on properties.

    Since in my case hash exists solely to be more efficient to index and group by, the above is not an option.

    Not mentioned in the docs for aggregating functions is the function ANY_VALUE that unpredictably returns the value from one of the grouped rows. Since I know the two columns are equivalent, I don't care which row it picks for properties.

    The resulting working query is:

    SELECT ANY_VALUE(`properties`) from `Example` GROUP BY `hash`;
    

  2. It has nothing to do with using a derived column.

    Functional dependency only works when grouping by the primary key, not any other unique or non-unique column.

    https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html

    SQL:1999 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. This would be the case, for example, were custid a primary key of customers.

    You can test your own example table to prove this. Group by the id (the table’s primary key), and you get no error:

    mysql> SELECT `properties` from `Example` GROUP BY `id`;
    Empty set (0.00 sec)
    

    Also if you add columns that are not derived columns, you still can’t rely on functional dependency, whether they are unique or non-unique.

    mysql> alter table example 
      add column u int, 
      add unique key (u), 
      add column n int, 
      add key (n);
    Query OK, 0 rows affected (0.02 sec)
    Records: 0  Duplicates: 0  Warnings: 0
    
    mysql> SELECT `properties` from `Example` GROUP BY `u`;
    ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.Example.properties' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
    
    mysql> SELECT `properties` from `Example` GROUP BY `n`;
    ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.Example.properties' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search