skip to Main Content

Is there a performance difference between doing a self-join vs. case expression when filtering a large table? For example, I have a list of names and professions, and I want to create a column that is only populated if a person is a doctor.

I did this in two ways:

  1. CASE WHEN
select name
, case 
    when profession = 'doctor' 
      then profession 
    else null 
  end as is_doctor
from professions
  1. LEFT JOIN … WHERE
select name
, is_doctor
from professions
  left join (
    select name
    , profession 
    from professions 
    where profession = 'doctor'
  ) d on professions.name = d.name

I know that WHERE tends to be less costly than CASE WHEN, but the LEFT JOIN could make it slower. The first is more readable but I’m wondering if the second is more efficient.

3

Answers


  1. Chosen as BEST ANSWER

    After some testing, the former case is nearly twice as fast and reads less data than the latter when ran against my database. From what I could understand from the EXPLAIN, the join was too expensive as it had to repartition the table.


  2. I don’t know from where you got the idea that CASE is slow. The first query, having no join, is bound to perform at least as good as the second.

    Login or Signup to reply.
  3. select name
    , case 
        when profession = 'doctor' 
          then profession 
      end as is_doctor
    from professions
    

    …is likely easier to maintain. So, in terms of developer time, it’s way faster.

    Assuming this is a trivial example of your actual, complicated SQL query — and shaving off a few milliseconds here and there really does matter — you’ll want to use your complicated SQL against your large data set(s) to perform your own testing.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search