I have the following table <state_table> that tracks entries per minute of an id and its state:
minute id type
------ -- ----
1 A solid
1 A solid
1 A solid
1 A liquid
1 B solid
1 B solid
1 B liquid
.... 1000+ rows ...
1 ZZX liquid
1 ZZZ liquid
2 A solid
2 A solid
2 A liquid
With the following query, I can get the top 1000 pairs based on occurrence:
With TempIds AS (
SELECT
state_table.minute as minute,
state_table.id as id,
COUNT(*)
FROM
state_table
GROUP BY 1,2
) SELECT
TempId.minute,
TempId.id,
TempId.count
FROM
TempIds
ORDER BY 3 DESC
LIMIT 1000
;
e.g.
minute id count
------ -- ----
2 B 1002
3 A 990
1 C 800
3 B 798
How can I modify my query to get the type of the id?
For example, there are 1002 <minute=2,id=B> rows. Is there a way to get there are 402 solids and 600 liquids?
minute id count type
------ -- ---- -----
2 B 402 solid
2 B 600 liquid
3 A 330 solid
3 A 660 liquid
The only way I can think of is a fairly complex nested query:
With TempTop AS (
With TempIds AS (
SELECT
state_table.minute as minute,
state_table.id as id,
COUNT(*)
FROM
state_table
GROUP BY 1,2
) SELECT
TempId.minute as minute,
TempId.id as id,
TempId.count
FROM
TempIds
ORDER BY 3 DESC
LIMIT 1000
)
) SELECT
state_table.minute,
state_table.id,
state_table.type,
COUNT(*)
FROM
state_table, TempTop
WHERE
state_table.minute = TempTop.minute
AND state_table.id = TempTop.id
;
Is there a simpler way to make this query?
Goal:
- For the Top 1000 most frequent pairs, get the breakdown of the type.
2
Answers
Your query is
and in this process you lose the types, because you want the 1000 top minute/id pairs, so you cannot simply
group by minute, id, type
instead.If this is only about the types ‘solid’ and ‘liquid’, you can apply conditional aggregation to get the separate counts along:
Summing up the boolean expressions works in MySQL, because true equals 1 and false equals 0 there.
The problem with the above queries are ties, by the way. If there are two pairs with the same 1000th count, you pick one arbitrarily instead of showing only 999 or 1001 pairs then in order to treat both tying pairs the same. So I’d probably re-write the queries using
DENSE_RANK
in order to properly handle ties.For the more complicated case where the types are unknown at the point of writing the query, you need rows instead of columns, just as already shown in your request. In that case you really need to
group by minute, id, type
first. The easiest way to get the total counts is withSUM OVER
then. Then rank the pairs as mentioned withDENSE_RANK
and keep the top 1000.This can get you more than 1000 minute/id pairs in case of ties. You can reduce this with
RANK
instead ofDENSE_RANK
. If these two approaches still don’t get the number you want, you may have to count minute/id pairs separately in a subquery instead:Best way I can think of is, you still have to use 2 CTEs but not nested ones.
You have 3 tasks to achieve,
CTE will count the records.
second one will use row_number() window function to assign row numbers based on decreasing order or count.
last select will do a join between state table and second cte with a where clause specifying row_nunber <= 1000 and grouping based on ID , State and Minute.