Is there a performance difference between doing a self-join vs. case expression when filtering a large table? For example, I have a list of names and professions, and I want to create a column that is only populated if a person is a doctor.
I did this in two ways:
- CASE WHEN
select name
, case
when profession = 'doctor'
then profession
else null
end as is_doctor
from professions
- LEFT JOIN … WHERE
select name
, is_doctor
from professions
left join (
select name
, profession
from professions
where profession = 'doctor'
) d on professions.name = d.name
I know that WHERE tends to be less costly than CASE WHEN, but the LEFT JOIN could make it slower. The first is more readable but I’m wondering if the second is more efficient.
3
Answers
After some testing, the former case is nearly twice as fast and reads less data than the latter when ran against my database. From what I could understand from the EXPLAIN, the join was too expensive as it had to repartition the table.
I don’t know from where you got the idea that
CASE
is slow. The first query, having no join, is bound to perform at least as good as the second.…is likely easier to maintain. So, in terms of developer time, it’s way faster.
Assuming this is a trivial example of your actual, complicated SQL query — and shaving off a few milliseconds here and there really does matter — you’ll want to use your complicated SQL against your large data set(s) to perform your own testing.