With the following table:
CREATE TABLE `Example` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`properties` json DEFAULT NULL,
`hash` binary(20) GENERATED ALWAYS AS (unhex(sha(`properties`))) STORED,
PRIMARY KEY (`id`),
KEY `hash` (`hash`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
the column hash
is derived from the column properties
. In the nomenclature in the docs, {properties} -> {hash}
Also from the docs concerning handling of GROUP BY, comes the following:
SQL:1999 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. This would be the case, for example, were custid a primary key of customers
However, despite this, the following query returns an error (no data is required in the table to reproduce this):
SELECT `properties` from `Example` GROUP BY `hash`;
The error is
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘dispatch.Example.properties’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
The error says the column is NOT functionally dependent. This is probably because the query analyzer doesn’t assume that the value returned by the SHA
function is deterministic. Maybe? Is the possibility of a hash collision killing this whole idea?
The hash is considerably less useful if I still have to apply an aggregate function to the properties
column to populate the result. Is there a way I can assert to MySQL that there is functional dependence?
Failing that, what’s the best way to say "just give me the property from any matching row" that doesn’t involve comparing properties
records (which is the point of the hash)? The best I’ve come up with is FIRST
in a window function, but that feels janky.
2
Answers
It was the direction of the dependency that I had wrong; with the above example
works correctly, since
hash
depends onproperties
.Since in my case
hash
exists solely to be more efficient to index and group by, the above is not an option.Not mentioned in the docs for aggregating functions is the function
ANY_VALUE
that unpredictably returns the value from one of the grouped rows. Since I know the two columns are equivalent, I don't care which row it picks forproperties
.The resulting working query is:
It has nothing to do with using a derived column.
Functional dependency only works when grouping by the primary key, not any other unique or non-unique column.
https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html
You can test your own example table to prove this. Group by the
id
(the table’s primary key), and you get no error:Also if you add columns that are not derived columns, you still can’t rely on functional dependency, whether they are unique or non-unique.