I have the following query:
SELECT DISTINCT FIRST_VALUE(business_id)
OVER (PARTITION BY b.sub_category_id
ORDER BY AVG(stars) desc, COUNT(*) DESC) business_id,
sub_category_id
FROM purchase_experience pe
JOIN businesses b ON b.id = pe.business_id
AND b.status = 'active'
AND b.sub_category_id IN (1010 ,1007 ,1034 ,1036)
WHERE pe.stars <> 0
GROUP BY business_id
LIMIT 4
It returns something like this:
business_id | sub_category_id
1744 | 1007
13215 | 1010
9231 | 1034
9103 | 1036
And here is the result of EXPLAIN
:
The sad part is, it takes about 7secs to be executed. Any idea how can I make it optimize?
2
Answers
Remove
DISTINCT
, it seems to be trying to do the same thing asFIRST_VALUE
.The query seems to be a "groupwise_max" problem. See the tag I added for better ways to do that. Or look here: Groupwise-Max
irst write the query to find the stats for the whole table, then figure out how to apply the groupwise-max.
Also, with
LIMIT 4
, which 4 do you want? You needORDER BY
to specify such.These may help:
This is a typical Top-1-per-group problem.
You can use
ROW_NUMBER
instead, this will be more efficient thanDISTINCT
.Note that you seem to be missing
sub_category_id
from yourGROUP BY
.