I am still somewhat new to SQL, so please bear with me.
Fundamentally what I want to do is the following, I have two tables:
mysql> SELECT * FROM product_table;
+----+---------------------------+--------------------------------------+
| id | product_name | category_array |
+----+---------------------------+--------------------------------------+
| 1 | Hike boots - Breathable M | ["12341", "21342", "31243"] |
| 2 | Tent throwable - XL | ["1239999", "21342", "1239999"] |
| 3 | Running boots pegasus - S | [ "12341", "1239999", "31243"] |
| 4 | Light Xtra bright - Wide | ["12399999", "21342", "41234"] |
| 5 | Medical kit | ["12399999", "12388888", "12377777"] |
+----+---------------------------+--------------------------------------+
mysql> SELECT * FROM category_table;
+----+---------------+-------------+
| id | category_name | category_id |
+----+---------------+-------------+
| 1 | Clothing | 12341 |
| 2 | Camping | 21342 |
| 3 | Sport | 31243 |
| 4 | Accessories | 41234 |
+----+---------------+-------------+
I want to combine these into a single table, like this :
mysql> SELECT * FROM product_table_combined;
+----+---------------------------+--------------------------------------+----------+
| id | product_name | category_array | category |
+----+---------------------------+--------------------------------------+----------+
| 1 | Hike boots - Breathable M | ["12341", "21342", "31243"] | Clothing |
| 2 | Tent throwable - XL | ["1239999", "21342", "1239999"] | Camping |
| 3 | Running boots pegasus - S | [ "12341", "1239999", "31243"] | Clothing |
| 4 | Light Xtra bright - Wide | ["12399999", "21342", "41234"] | Camping |
| 5 | Medical kit | ["12399999", "12388888", "12377777"] | NULL |
+----+---------------------------+--------------------------------------+----------+
Thus when the join happens I only the row to be added for the category that appears first in category_array or a NULL if it is not present in category_array. (I also want a version that appears last, but I hope that the SQL queries will be similar)
I have tried many things to try and make this happen, the closest that I have gotten is creating a temporary table, which has an additional column index_found specifying how soon in category_array it was found.
Using this command:
INSERT INTO temp_product_table_combined(product_name, category_array, category, index_found)
SELECT product.product_name, product.category_array, ANY_VALUE(category.category_name), MIN(REGEXP_INSTR(product.category_array, category.category_id))
FROM product_table AS product
LEFT JOIN category_table AS category ON product.category_array LIKE CONCAT('%', category.category_id, '%')
GROUP BY
product.product_name, product.category_array
Which gives almost the correct behaviour except that it gets a random value for the column of category.
mysql> SELECT * FROM temp_product_table_combined;
+----+---------------------------+--------------------------------------+-------------+-------------+
| id | product_name | category_array | category | index_found |
+----+---------------------------+--------------------------------------+-------------+-------------+
| 1 | Hike boots - Breathable M | ["12341", "21342", "31243"] | Sport | 3 |
| 2 | Tent throwable - XL | ["1239999", "21342", "1239999"] | Camping | 14 |
| 3 | Running boots pegasus - S | [ "12341", "1239999", "31243"] | Sport | 4 |
| 4 | Light Xtra bright - Wide | ["12399999", "21342", "41234"] | Accessories | 15 |
| 5 | Medical kit | ["12399999", "12388888", "12377777"] | NULL | NULL |
+----+---------------------------+--------------------------------------+-------------+-------------+
The problem lies with the necessity of ANY_VALUE. If I leave it out (which if allowed, should I believe give the correct table). I get the error message:
Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘playground.category.category_name’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by.
I have tried doing a global disable of only_full_group_by mode and removing the ANY_VALUE, but it still gives this behaviour.
2
Answers
If you only want to match the first category in the array, use a
LIKE
pattern that matches the["
at the beginning:You could also use a JSON function to extract the first category from the array:
There is a way to resolve this issue, if your
category_table
columns,category_id's
length is fixed length. According to your input,category_table → category_id
length =5
.According to your query:
already found the index of
category_table.category_id
inproduct.category_array
column.So, I used this data to found the sub-string of length 5 (
category_table.category_id
) inproduct.category_array
column.and final MySQL query is:
and your output is:
But, I did not like this `group by` query. There is an alternative and simple solution of `group by scenario` as follows:
All query are explain one-by-one here:
db<>fiddle
I hope this will give you a road map to resolve the issue…