skip to Main Content

I’m trying to categorize a certain entity, let’s say, a User, with the most frequent value associated in a table to this entity, let’s say, an Emotion.

Thus,a User will be categorized as "Sad" when he logs mostly the value "Sad", or "Happy" when he logs mostly the value "Happy".

An example of the origin table:

user_id emotion registered_on
"PAM" "SAD" 2021-04-05
"PAM" "HAPPY" 2021-04-06
"PAM" "HAPPY" 2021-04-07
"TIM" "SAD" 2021-04-06
"TIM" "SAD" 2021-05-01
"TIM" "HAPPY" 2021-06-05

the result I’m looking for:

user_id year emotion
"TIM" 2021 "SAD"
"PAM" 2021 "HAPPY"

I’m stuck with Mysql 5.7 so I cannot use window functions.

this is the select I came up with, was planning on using it as a subquery to choose from,
but now I’m stuck as to how to select only the rows with the higher value in how_many_times for each user.

select
    user_id,
    YEAR(MAX(registered_on)) as year,
    emotion,
    COUNT(user_id) as how_many_times
from users_emotions
group by user_id, emotion
order by user_id, how_many_times desc

can anyone help?

This is the fiddle: https://www.db-fiddle.com/f/bQnSQZpAorT48Rv2DRMjxS/2

Edit:

So I was almost there, the answer marked as duplicate helped me, not the one marked as the right one, but the one by Vishal Kumar:

(eliminated year for clarity)

select ordered.user_id, SUBSTRING_INDEX(GROUP_CONCAT(ordered.emotion order by ordered.how_many_times desc), ',',1) as emotions from 
(select
    user_id,
    COUNT(user_id) as how_many_times,
    emotion
from users_emotions
group by user_id, emotion) as ordered
group by ordered.user_id
;

I understand it’s not perfect, because in case of a tie, it just picks one, but it does what I need!

https://www.db-fiddle.com/f/qyj341zapSob6pvAkcw7Fd/0

5

Answers


  1. The query would differ much from MySQL 5.x to MySQL 8, you onoly would use CTE’s

    With only two emotions this is quite easy

    Query #1

    SELECT
    
        user_id,year_emotion,
        CASE WHEN how_many_happy = how_many_unhappy THEN 'middle' 
        WHEN how_many_happy > how_many_unhappy THEN 'HAPPY'
        ELSE 'SAD' END emotions
    FROM    
    (select
        user_id,
        YEAR(registered_on) year_emotion,
        SUM( emotion =  "HAPPY") as how_many_happy,    
        SUM( emotion =  "SAD") as how_many_unhappy
    from users_emotions
    group by user_id, YEAR(registered_on)) t1
    ORDER  BY user_id
    ;
    
    user_id year_emotion emotions
    1 2021 SAD
    2 2021 middle
    3 2021 SAD

    View on DB Fiddle

    with 3 it gets more complicated


    Query #1

    SELECT
    
        user_id,year_emotion,
        CASE WHEN how_many_happy = how_many_unhappy 
              AND  how_many_happy = how_many_confused THEN 'middle' 
        WHEN how_many_happy >= how_many_unhappy THEN 
           CASE  WHEN how_many_happy > how_many_confused then   'HAPPY'
                 ELSE 'COnfused' END 
        ELSE 
        CASE WHEN how_many_unhappy > how_many_confused then   'SAD'
        ELSE 'COMNFUSED'
         END 
         END emotions
    FROM    
    (select
        user_id,
        YEAR(registered_on) year_emotion,
        SUM( emotion =  "HAPPY") as how_many_happy,    
        SUM( emotion =  "SAD") as how_many_unhappy
     ,SUM( emotion =  "CONFUSED") as how_many_confused
    from users_emotions
    group by user_id, YEAR(registered_on)) t1
    ORDER  BY user_id
    ;
    
    user_id year_emotion emotions
    1 2021 SAD
    2 2021 HAPPY
    3 2021 COMNFUSED

    View on DB Fiddle

    Login or Signup to reply.
  2. You could groupe the data by columns user_id, Year, and emotion and then selecting the row with the highest count for each user_id, Year, and emotion combination using the ROW_NUMBER() window function. The resulting output shows the user_id, Year, and emotion for each row with the highest count:

    Query:

    SELECT
      user_id,
      Year,
      emotion
    FROM
      (
        SELECT
          user_id,
          DATE_FORMAT(registered_on, '%Y') AS Year,
          emotion,
          COUNT(*) AS count,
          ROW_NUMBER() OVER(
            PARTITION BY user_id
            ORDER BY
              COUNT(*) DESC
          ) rn
        FROM
          users_emotions
        GROUP BY
          user_id,
          Year,
          emotion
      ) t
    WHERE
      rn = 1
    

    View on DB Fiddle

    Login or Signup to reply.
  3. Well, if it’s MysSql 5.7, then you can not use window functions and common table expressions. So, you have to use derived tables (subqueries) few times:

    select t1.*
    from (
     select 
        user_id, 
        emotion,
        YEAR(registered_on) as regyear,
        COUNT(*) as cnt
     from users_emotions
     group by 1, 2, 3
    ) as t1
    where t1.cnt =  (
      select 
      MAX(cnt) as mcnt
      from (
        select 
          user_id, 
          emotion,
          YEAR(registered_on) as regyear,
          COUNT(*) as cnt
        from users_emotions
        group by 1, 2, 3) as sq
    );
    

    Updated Sql fiddle

    Login or Signup to reply.
  4. if you have a small number of pre-defined emotions (categories), then nbk’s method will perform much better than this.

    This will work for any number of possible emotions but uses a slightly arbitrary e1.emotion > e2.emotion to break any ties:

    SELECT e1.user_id, e1.reg_year AS year, e1.emotion
    FROM (
        SELECT
            user_id, 
            emotion,
            YEAR(registered_on) as reg_year,
            COUNT(*) as cnt
        FROM users_emotions
        GROUP BY 1, 2, 3
    ) e1
    LEFT JOIN (
        SELECT
            user_id, 
            emotion,
            YEAR(registered_on) as reg_year,
            COUNT(*) as cnt
        FROM users_emotions
        GROUP BY 1, 2, 3
    ) e2
        ON e1.user_id = e2.user_id
        AND e1.reg_year = e2.reg_year
        AND (e1.cnt < e2.cnt OR
             e1.cnt = e2.cnt AND e1.emotion > e2.emotion)
    WHERE e2.user_id IS NULL
    

    Here’s a db<>fiddle

    Login or Signup to reply.
  5. Try the following aggregation with a correlated subquery in the having clause:

    select user_id, 
           emotion, 
           year(max(registered_on)) year_
    from users_emotions ue
    group by user_id, emotion
    having count(*) = 
      (
        select max(cnt)
        from
        (
          select user_id, emotion, count(*) cnt
          from users_emotions
          group by user_id, emotion
        ) mx
        where mx.user_id = ue.user_id
      )
    order by user_id
    

    demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search