skip to Main Content

I am still somewhat new to SQL, so please bear with me.

Fundamentally what I want to do is the following, I have two tables:

mysql> SELECT * FROM product_table;
+----+---------------------------+--------------------------------------+
| id | product_name              | category_array                       |
+----+---------------------------+--------------------------------------+
|  1 | Hike boots - Breathable M | ["12341", "21342", "31243"]          |
|  2 | Tent throwable - XL       | ["1239999", "21342", "1239999"]      |
|  3 | Running boots pegasus - S | [ "12341",  "1239999", "31243"]      |
|  4 | Light Xtra bright - Wide  | ["12399999", "21342", "41234"]       |
|  5 | Medical kit               | ["12399999", "12388888", "12377777"] |
+----+---------------------------+--------------------------------------+
mysql> SELECT * FROM category_table;
+----+---------------+-------------+
| id | category_name | category_id |
+----+---------------+-------------+
|  1 | Clothing      | 12341       |
|  2 | Camping       | 21342       |
|  3 | Sport         | 31243       |
|  4 | Accessories   | 41234       |
+----+---------------+-------------+

I want to combine these into a single table, like this :

mysql> SELECT * FROM product_table_combined;
+----+---------------------------+--------------------------------------+----------+
| id | product_name              | category_array                       | category |
+----+---------------------------+--------------------------------------+----------+
|  1 | Hike boots - Breathable M | ["12341", "21342", "31243"]          | Clothing |
|  2 | Tent throwable - XL       | ["1239999", "21342", "1239999"]      | Camping  |
|  3 | Running boots pegasus - S | [ "12341",  "1239999", "31243"]      | Clothing |
|  4 | Light Xtra bright - Wide  | ["12399999", "21342", "41234"]       | Camping  |
|  5 | Medical kit               | ["12399999", "12388888", "12377777"] | NULL     |
+----+---------------------------+--------------------------------------+----------+

Thus when the join happens I only the row to be added for the category that appears first in category_array or a NULL if it is not present in category_array. (I also want a version that appears last, but I hope that the SQL queries will be similar)


I have tried many things to try and make this happen, the closest that I have gotten is creating a temporary table, which has an additional column index_found specifying how soon in category_array it was found.

Using this command:

INSERT INTO temp_product_table_combined(product_name, category_array, category, index_found)

SELECT product.product_name, product.category_array, ANY_VALUE(category.category_name), MIN(REGEXP_INSTR(product.category_array, category.category_id)) 

FROM product_table AS product 
LEFT JOIN category_table AS category ON product.category_array LIKE CONCAT('%', category.category_id, '%')
GROUP BY 
    product.product_name, product.category_array

Which gives almost the correct behaviour except that it gets a random value for the column of category.

mysql> SELECT * FROM temp_product_table_combined;
+----+---------------------------+--------------------------------------+-------------+-------------+
| id | product_name              | category_array                       | category    | index_found |
+----+---------------------------+--------------------------------------+-------------+-------------+
|  1 | Hike boots - Breathable M | ["12341", "21342", "31243"]          | Sport       |           3 |
|  2 | Tent throwable - XL       | ["1239999", "21342", "1239999"]      | Camping     |          14 |
|  3 | Running boots pegasus - S | [ "12341",  "1239999", "31243"]      | Sport       |           4 |
|  4 | Light Xtra bright - Wide  | ["12399999", "21342", "41234"]       | Accessories |          15 |
|  5 | Medical kit               | ["12399999", "12388888", "12377777"] | NULL        |        NULL |
+----+---------------------------+--------------------------------------+-------------+-------------+

The problem lies with the necessity of ANY_VALUE. If I leave it out (which if allowed, should I believe give the correct table). I get the error message:

Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘playground.category.category_name’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by.

I have tried doing a global disable of only_full_group_by mode and removing the ANY_VALUE, but it still gives this behaviour.

2

Answers


  1. If you only want to match the first category in the array, use a LIKE pattern that matches the [" at the beginning:

    ON product.category_array LIKE CONCAT('[":', category.category_id, '"%')
    

    You could also use a JSON function to extract the first category from the array:

    ON category.category_id = product.category_array->>"$[0]"
    
    Login or Signup to reply.
  2. There is a way to resolve this issue, if your category_table columns, category_id's length is fixed length. According to your input, category_table → category_id length = 5.

    According to your query:

    MIN(REGEXP_INSTR(product.category_array, category.category_id)) 
    

    already found the index of category_table.category_id in product.category_array column.

    So, I used this data to found the sub-string of length 5 (category_table.category_id) in product.category_array column.

    and final MySQL query is:

    WITH T AS (
        SELECT DISTINCT
            p.product_name, 
            p.category_array,
            MIN(REGEXP_INSTR(p.category_array, c.category_id)) as min_index,
            -- '.{5}' length of category_table.category_id (consider fixed)
            -- CASE 
            --     WHEN(MIN(REGEXP_INSTR(p.category_array, c.category_id)) IS NOT NULL)
            --          THEN REGEXP_SUBSTR(p.category_array, '.{5}', MIN(REGEXP_INSTR(p.category_array, c.category_id)))
            -- END as subStrVal,
            REGEXP_SUBSTR(p.category_array, '.{5}', 
                MIN(REGEXP_INSTR(p.category_array, c.category_id))
            ) as p_category
    
        FROM product_table p
        LEFT JOIN category_table c 
        ON p.category_array LIKE CONCAT('%', c.category_id, '%')
        GROUP BY 
            p.product_name, p.category_array
    )
    SELECT pt.*, ct.category_name 
    FROM product_table pt
    LEFT JOIN T
    ON pt.product_name = T.product_name
    LEFT JOIN category_table ct
    ON T.p_category = ct.category_id
    

    and your output is:
    enter image description here



    But, I did not like this `group by` query. There is an alternative and simple solution of `group by scenario` as follows:

    with TT as (
        select 
            concat('(', GROUP_CONCAT(distinct category_id SEPARATOR '|'), ')') as category_ids
        from category_table
    ),
    T1 as (
        select p.*, 
        REGEXP_INSTR(p.category_array, TT.category_ids, 1) as min_match_index,
        CASE 
            WHEN(REGEXP_INSTR(p.category_array, TT.category_ids, 1)>0) THEN
                REGEXP_SUBSTR(p.category_array, '.{5}', 
                    REGEXP_INSTR(p.category_array, TT.category_ids, 1) )
        END as p_category
        from product_table p, TT
    )
    select T1.id, 
      T1.product_name, 
      T1.category_array, 
      ct.category_name 
    from T1
    LEFT JOIN category_table ct
    ON T1.p_category = ct.category_id
    

    All query are explain one-by-one here:
    db<>fiddle

    I hope this will give you a road map to resolve the issue…

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search