Mysql how to find count of distinct columns based on another distinct column?

mahfujasif
October 29, 2022
191 views
3 votes
4 Answers

I have a table like this

userId   story  novel
1        a      b
1        a      b
1        a      c
1        b      c
1        b      c
2        x      x
2        x      y
2        y      y
3        m      n
4        NULL   NULL

How do I find the most story and novel count per user?

What I am looking for is the highest distinct count of story and novel for each user. So if a user has no story then story_count should be 0.

Desired output looks like this

userId   story  story_count  novel novel_count
1        a      3            c     3
2        x      2            y     2
3        m      1            n     1
4        NULL   0            NULL  0

This is my faulty current attempt

SELECT userId, story, COUNT(story) as story_count, novel, COUNT(novel) as novel_count  
FROM logs WHERE user = (SELECT DISTINCT(user)) GROUP BY story, novel;

Tags: mysql sql

Answers

I hope this works:


select story.userid, story, story_count, novel, novel_count 
from
    (
        select userid, story, count(*) story_count, 
        row_number() over (partition by userid order by count(*) desc) rown_num
        from logs
        where row_num=1
    )story
left join 
    (
        select userid, novel, count(*) novel_count, 
        row_number() over (partition by userid order by count(*) desc) rown_num
        from logs
        where row_num=1
    )novel on novel.userid=story.userid

Based on Tim’s answer,I provide an upgrade solution

create table `logs`(
 userId int,
 story varchar(10),
 novel varchar(10)
);

insert into `logs`(userId,story,novel) values
(1,'a',' b'),
(1 ,'a','c'),
(1 ,'b','c'),
(2 ,'x','x'),
(2 ,'x','y'),
(2 ,'y','y');

SELECT t1.userId,t1.story,t1.story_count,t2.novel,t2.novel_count
FROM
(
SELECT userId, story, COUNT(*) AS story_count,
       RANK() OVER (PARTITION BY userId ORDER BY COUNT(*) DESC) rn
FROM logs
GROUP BY userId, story
) as t1
join
(
SELECT userId, novel, COUNT(*) AS novel_count,
       RANK() OVER (PARTITION BY userId ORDER BY COUNT(*) DESC) rn
FROM logs
GROUP BY userId, novel
)as t2
ON t1.userId = t2.userId and t1.rn = t2.rn
WHERE t1.rn =1

DB Fiddle Demo

You should really use a window function if this is possible because thus, you can keep your query much shorter and simpler. If this is not possible and you really need to do it without them, you can also create two subqueries for both the story data and the novel data according to your conditions and then join them. Something like this:

SELECT storydata.userId, storydata.story, storydata.counter AS story_count,
noveldata.novel, noveldata.counter AS novel_count  
FROM
(SELECT DISTINCT l.userId, l.story, sub.counter
FROM logs l
JOIN
(SELECT userId, COALESCE(MAX(story_count),0) AS counter
FROM
(SELECT userId, story,
COUNT(story) as story_count  
FROM logs 
GROUP BY userId, story) sub
GROUP BY userId) sub
ON l.userId = sub.userId
GROUP BY l.userId, l.story
HAVING COUNT(l.userId) = sub.counter OR l.story IS NULL) AS storydata
JOIN
(SELECT noveldata.userId, noveldata.novel, noveldata.counter AS counter
FROM
(SELECT DISTINCT l.userId, l.novel, sub.counter
FROM logs l
JOIN
(SELECT userId, COALESCE(MAX(novel_count),0) AS counter
FROM
(SELECT userId, novel,
COUNT(novel) as novel_count  
FROM logs 
GROUP BY userId, novel) sub
GROUP BY userId) sub
ON l.userId = sub.userId
GROUP BY l.userId, l.novel
HAVING COUNT(l.userId) = sub.counter OR l.novel IS NULL) AS noveldata) AS noveldata
ON storydata.userId = noveldata.userId;

But as you can see, this will become very complicated although it will produce the correct outcome. See here:
db<>fiddle

Therefore, I once again highly recommend to use a DB version that provides window functions.

Try the following for MySQL 5.7:

set @u=0;
set @tot=0;
set @rn=1;
select userid, story, story_count, novel, novel_count
from
(
  select userid, story, story_count, novel, novel_count,  
         if(@u<>userid or (@rn=1 and @tot=tot_cnt), @rn:=1, @rn:= @rn+1) rnk,
         @u:=userid, @tot:=tot_cnt
  from
  (
    select T.userid, T.story, T.story_count, D.novel, D.novel_count, 
           T.story_count+D.novel_count tot_cnt
    from
    (
      select userid, story, count(story) story_count  
      from table_name group by userid, story
    ) T 
    join
    (
      select userid, novel, count(novel) novel_count  
      from table_name group by userid, novel
    ) D
    on T.userid=D.userid
  ) joined_counts
  order by userid, tot_cnt desc
) ranked_counts
where rnk = 1
order by userid

This query will select all the equal highest counts of story and novel for each user, if you want to select only one highest count (from the multiple equal highest counts) then replace the if statement with this if(@u<>userid, @rn:=1, @rn:= @rn+1) rnk.

See a demo.

Please signup or login to give your own answer.

Click here to cancel reply.