Data: user_choices
table
id | userid | choices |
---|---|---|
1 | 3 | 1,2,3 |
2 | 55 | 5 |
3 | 783 | 2,6,7 |
4 | 45 | 1,4 |
5 | 5 | 1,2,3,4,5,6,7 |
How can I query this table to explode the choices
column filtered only to show values 1 or 2 or 3 or 4 or 7 and group it by userid
as follows?
userid | choice |
---|---|
3 | 1 |
3 | 2 |
3 | 3 |
783 | 2 |
783 | 7 |
45 | 1 |
45 | 4 |
5 | 1 |
5 | 2 |
5 | 3 |
5 | 4 |
5 | 7 |
I have tried the following, but the results include 5
and 6
, which should be omitted.
select replace(substring(substring_index(uc.choices, ',', 1), CHAR_LENGTH(SUBSTRING_INDEX(uc.choices, ',', -1)), + 1), ',', '') choice,,
uc.userid
from user_choices uc
join (select 1 digit union all select 2 union all select 3 union all select 4 union all select 7) n on length(replace(uc.choices, ',', '')) < length(uc.choices) - n.digit
where choices regexp '[12347]'
order by uc.userid
Note: I know the regexp isn’t perfect here (matches on 22
, for example). The table here won’t ever receive this, so I don’t think stricter regex is required.
OUTPUT:
choice | userid |
---|---|
1 | 3 |
1 | 5 |
1 | 5 |
1 | 5 |
1 | 5 |
2 | 783 |
This attempt is based on this answer to another question, but I can’t quite get it working for my scenario.
Your help is much appreciated!
2
Answers
An alternative to the numbers table technique is to use JSON.
The trick is that, given the format of your CSV strings (numbers separated by commas), we can easily turn them to valid JSON values by just surrounding them with square brackets; eg
[1,2,3]
is a valid JSON array. Then, we can use handy JSON functionjson_table()
to unnest the arrays to rows. The last step is filtering.So:
fiddle
I’m not clear on exactly how your sql is failing, but you seem to be not getting the approach quite right.
First, you should only join digits that match the number of choices in each row (so 1, 2, and 3 for ‘1,6,7’); to do this you count the number of commas, add one (since there is one more choice than commas), and compare to the digit:
You seem to be trying something like that but not quite correctly.
Second, to extract the nth choice, you want to use substring_index to get the first n choices and use it again to get the last of those:
I didn’t follow what your replace/substring was even trying to do but the double substring_index is much simpler.
So:
If you do have choices where the regexp isn’t good enough, you can use e.g.: