I’m trying to categorize a certain entity, let’s say, a User, with the most frequent value associated in a table to this entity, let’s say, an Emotion.
Thus,a User will be categorized as "Sad" when he logs mostly the value "Sad", or "Happy" when he logs mostly the value "Happy".
An example of the origin table:
user_id | emotion | registered_on |
---|---|---|
"PAM" | "SAD" | 2021-04-05 |
"PAM" | "HAPPY" | 2021-04-06 |
"PAM" | "HAPPY" | 2021-04-07 |
"TIM" | "SAD" | 2021-04-06 |
"TIM" | "SAD" | 2021-05-01 |
"TIM" | "HAPPY" | 2021-06-05 |
the result I’m looking for:
user_id | year | emotion |
---|---|---|
"TIM" | 2021 | "SAD" |
"PAM" | 2021 | "HAPPY" |
I’m stuck with Mysql 5.7 so I cannot use window functions.
this is the select I came up with, was planning on using it as a subquery to choose from,
but now I’m stuck as to how to select only the rows with the higher value in how_many_times for each user.
select
user_id,
YEAR(MAX(registered_on)) as year,
emotion,
COUNT(user_id) as how_many_times
from users_emotions
group by user_id, emotion
order by user_id, how_many_times desc
can anyone help?
This is the fiddle: https://www.db-fiddle.com/f/bQnSQZpAorT48Rv2DRMjxS/2
Edit:
So I was almost there, the answer marked as duplicate helped me, not the one marked as the right one, but the one by Vishal Kumar:
(eliminated year for clarity)
select ordered.user_id, SUBSTRING_INDEX(GROUP_CONCAT(ordered.emotion order by ordered.how_many_times desc), ',',1) as emotions from
(select
user_id,
COUNT(user_id) as how_many_times,
emotion
from users_emotions
group by user_id, emotion) as ordered
group by ordered.user_id
;
I understand it’s not perfect, because in case of a tie, it just picks one, but it does what I need!
5
Answers
The query would differ much from MySQL 5.x to MySQL 8, you onoly would use CTE’s
With only two emotions this is quite easy
Query #1
View on DB Fiddle
with 3 it gets more complicated
Query #1
View on DB Fiddle
You could groupe the data by columns
user_id, Year, and emotion
and then selecting the row with the highestcount
for eachuser_id, Year, and emotion
combination using theROW_NUMBER()
window function. The resulting output shows theuser_id, Year, and emotion
for each row with the highest count:Query:
View on DB Fiddle
Well, if it’s MysSql 5.7, then you can not use window functions and common table expressions. So, you have to use derived tables (subqueries) few times:
Updated Sql fiddle
if you have a small number of pre-defined
emotions
(categories), then nbk’s method will perform much better than this.This will work for any number of possible
emotions
but uses a slightly arbitrarye1.emotion > e2.emotion
to break any ties:Here’s a db<>fiddle
Try the following aggregation with a correlated subquery in the having clause:
demo