skip to Main Content

Below is the query:
I got this query over internet while searching to sort comma separated values in a single column

SELECT id, title, GROUP_CONCAT(aNumber ORDER BY aNumber)
FROM
(
    SELECT id, title, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(numbers, ',', tens.acnt * 10 + units.acnt + 1), ',', -1) AS UNSIGNED) AS aNumber
    FROM some_table
    CROSS JOIN
    (SELECT 0 AS acnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    CROSS JOIN
    (SELECT 0 AS acnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
    WHERE LENGTH(numbers) - LENGTH(REPLACE(numbers, ',', '')) >= tens.acnt * 10 + units.acnt
) sub0
GROUP BY id, title;

2

Answers


  1. That is simple.

    This part(and next one)

    (SELECT 0 AS acnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    

    Just do a temporary table which contains digits 0 to 9 with column name acnt

    So query is:

    SELECT id, title, GROUP_CONCAT(aNumber ORDER BY aNumber)
    FROM
    (
        SELECT id, title, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(numbers, ',', tens.acnt * 10 + units.acnt + 1), ',', -1) AS UNSIGNED) AS aNumber
        FROM some_table
        CROSS JOIN
        (temp_table_digits) units
        CROSS JOIN
        (temp_table_digits) tens
        WHERE LENGTH(numbers) - LENGTH(REPLACE(numbers, ',', '')) >= tens.acnt * 10 + units.acnt
    ) sub0
    GROUP BY id, title;
    

    After cross join inside you got sets like tens+digit – two digit numbers.

    This part :

    SUBSTRING_INDEX(numbers, ‘,’, tens.acnt * 10 + units.acnt + 1

    Will get index of N’th number from number column, comma delimited. And n is 0 to 99 from the internal tens+digits tables. After that it takes the -1 index, which is right last.
    So internal loop get different id,title, number[n] rows from n=0 to n=99. VERY not effective.

    After that it sorted by number[n] and you got id, tittle, sorted.
    Once you have id, title, sorted you are joining it using group_concat back into csv format.

    But this query have two issues

    1. It is not effective

    2. it relies on the fact that group contact will not shuffle rows, which MAY be not true for some realizations.

    Login or Signup to reply.
  2. The query "explodes" the comma-separated list into individual rows, then recombines the rows into a comma-separated list again, but the items in the list are sorted.

    These subqueries generate 100 rows, by making a cross product of a set of 0..9 values against another set of 0..9 values.

    (SELECT 0 AS acnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    CROSS JOIN
    (SELECT 0 AS acnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
    

    This expression calculates the number of items in the comma-separated list. The length of the string compared to the length of the string with its commas removed is the number of commas, which is one fewer than the number of items in the list.

    LENGTH(numbers) - LENGTH(REPLACE(numbers, ',', ''))
    

    Then that expression is used to filter the cross-product of digits, so you get only as many as are in the comma-separated list, not all 99. I guess this is designed for lists that will never exceed 99 items.

    WHERE LENGTH(numbers) - LENGTH(REPLACE(numbers, ',', '')) >= tens.acnt * 10 + units.acnt
    

    This takes the _N_th item from the comma-separated list. Unfortunately MySQL doesn’t have a convenient function for that, the closest is has is SUBSTRING_INDEX() which can take the first N items or the trailing N items. Nesting these calls can take the last item from the first N items, resulting in a single item with is _N_th in the original list.

    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(numbers, ',', tens.acnt * 10 + units.acnt + 1), ',', -1) AS UNSIGNED) AS aNumber
    

    Once exploded into rows, then this repacks the rows into a comma-separated list on one row, but with a guarantee that they are sorted.

    GROUP_CONCAT(aNumber ORDER BY aNumber)
    

    This is frankly a lot of trouble to write such a complex query, to compensate for the decision to store values in a comma-separated list in the database, when the values were intended to be used as discrete values.

    I mean, it’s okay to store a string with numbers and commas, as long as you treat that string verbatim, storing it and fetching it as is. But if you want to use SQL predicates to search or sort individual values in that string, then you have committed yourself to a lot of expensive workarounds like this query.

    It’s smarter in a relational database to avoid storing comma-separated lists of values as a string. Instead, store each value on its own row. Then you search, sort, filter, or whatever you want, using much simpler queries that are easier to understand.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search