SQL optimization problem, which of the two solutions below is the most efficient?
I have the table from the image, I need to group the data by CPF
and date
and know if the CPFs had at least one login_ok = true
on a specific date. Both solutions below satisfy my need but the goal is to find the best query.
We can have multiple login_ok = true
and login_ok = false
for CPFs on a specific date
. I just need to know if there was at least one login_ok = true
I already have two solutions, I want to discuss how to make another more efficient
2
Answers
Maybe this would work for your problem:
DISTINCT would also work, and I doubt it would pose any performance threat in your case. Usually it evals expressions (like date(data)) before checking for uniqueness.
By using a subquery, in this case, you can select upfront which CPFs to include and then extract date. Finally you’d group by on a quite smaller number os lines, since those were previously selected.
PostgreSQL has the function
BOOL_OR
to check whether the expression is true for at least one row. It is likely to be optimised for this kind of task.An index on
(cpf, date(data))
or even on(cpf, date(data), login_ok)
could help speed up the query.On a side note: You may also want to order your results with
ORDER BY
. Don’t rely onGROUP BY
doing this. The order of the rows resulting from a query is arbitrary without aGROUP BY
clause.