skip to Main Content

Data: user_choices table

id userid choices
1 3 1,2,3
2 55 5
3 783 2,6,7
4 45 1,4
5 5 1,2,3,4,5,6,7

How can I query this table to explode the choices column filtered only to show values 1 or 2 or 3 or 4 or 7 and group it by userid as follows?

userid choice
3 1
3 2
3 3
783 2
783 7
45 1
45 4
5 1
5 2
5 3
5 4
5 7

I have tried the following, but the results include 5 and 6, which should be omitted.

select replace(substring(substring_index(uc.choices, ',', 1), CHAR_LENGTH(SUBSTRING_INDEX(uc.choices, ',', -1)), + 1), ',', '') choice,,
                uc.userid
from user_choices uc
         join (select 1 digit union all select 2 union all select 3 union all select 4 union all select 7) n on length(replace(uc.choices, ',', '')) < length(uc.choices) - n.digit
where choices regexp '[12347]'
order by uc.userid

Note: I know the regexp isn’t perfect here (matches on 22, for example). The table here won’t ever receive this, so I don’t think stricter regex is required.

OUTPUT:

choice userid
1 3
1 5
1 5
1 5
1 5
2 783

This attempt is based on this answer to another question, but I can’t quite get it working for my scenario.

Your help is much appreciated!

2

Answers


  1. An alternative to the numbers table technique is to use JSON.

    The trick is that, given the format of your CSV strings (numbers separated by commas), we can easily turn them to valid JSON values by just surrounding them with square brackets; eg [1,2,3] is a valid JSON array. Then, we can use handy JSON function json_table() to unnest the arrays to rows. The last step is filtering.

    So:

    select t.userid, j.choice
    from mytable t
    cross join json_table( concat('[', t.choices, ']'), '$[*]' columns (choice int path '$') ) j
    where j.choice in (1, 2, 3, 4, 7)
    order by t.userid, j.choice
    
    userid choice
    3 1
    3 2
    3 3
    5 1
    5 2
    5 3
    5 4
    5 7
    45 1
    45 4
    783 2
    783 7

    fiddle

    Login or Signup to reply.
  2. I’m not clear on exactly how your sql is failing, but you seem to be not getting the approach quite right.

    First, you should only join digits that match the number of choices in each row (so 1, 2, and 3 for ‘1,6,7’); to do this you count the number of commas, add one (since there is one more choice than commas), and compare to the digit:

    on char_length(choices)-char_length(replace(choices,',',''))+1 >= digit
    

    You seem to be trying something like that but not quite correctly.

    Second, to extract the nth choice, you want to use substring_index to get the first n choices and use it again to get the last of those:

    substring_index(substring_index(choices,',',digit),',',-1)
    

    I didn’t follow what your replace/substring was even trying to do but the double substring_index is much simpler.

    So:

    select
        substring_index(substring_index(uc.choices,',',digit),',',-1) choice,
        uc.userid
    from user_choices uc
    join (select 1 digit union all select 2 union all select 3 union all select 4 union all select 7) n
        on char_length(uc.choices)-char_length(replace(uc.choices,',',''))+1 >= digit
    where uc.choices regexp '[12347]'
    having choice in (1,2,3,4,7)
    order by uc.userid
    

    If you do have choices where the regexp isn’t good enough, you can use e.g.:

    where find_in_set(1,uc.choices) or find_in_set(2,uc.choices) or find_in_set(3,uc.choices) or find_in_set(4,uc.choices) or find_in_set(7,uc.choices)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search