skip to Main Content

I wrote a SQL query to fetch all movies and, if present, include the name ‘Brad Pitt’ for movies where he’s a cast member. The query is as follows:

select m.title,  p_cast.person_name 
From movie m left join (
      movie_cast m_cast join person p_cast on m_cast.person_id = p_cast.person_id and 
      p_cast.person_name = 'Brad Pitt'
 ) on m.movie_id = m_cast.movie_id 

I expected the execution plan in MySQL Workbench to show that the database left-joins the movie table with the result of the join between movie_cast and person. However, the plan suggests a different sequence: it appears to join movie with movie_cast first, and only then does it join with person where the filter for ‘Brad Pitt’ is applied. Here is the plan.
Query Execution plan.

If this interpretation of the execution plan is accurate, I’m trying to understand where the left join is actually being implemented. If the join sequence is as described, I expected that rows including cast members other than Brad Pitt would be retained, which is not the case.

Does the execution plan really represent what’s happening internally (and if so, how to make sense of the execution flow?), or is there a potential error in how it’s being displayed or interpreted?

2

Answers


  1. Simplified version of your query. No parenthesis required. Do the LEFT JOIN conditions at each level from movie to cast to person

    select 
          m.title, 
          p_cast.person_name 
       From 
          movie m 
             left join movie_cast m_cast 
                on m.movie_id = m_cast.movie_id 
                   join person p_cast 
                      on m_cast.person_id = p_cast.person_id 
                     and p_cast.person_name = 'Brad Pitt'
    

    Now, this result might give you more than you expect as the left-join to the cast will get 1 record for Every cast member, but only if the left-join to the person qualified for ‘Brad Pitt’ will pull the actual name. So you would get a list the same moving showing and bunch of null values for the person UNTIL Brad Pitt.

    Now, lets add something to see your intent vs what your query is. So lets see if THIS makes more sense… Is what you are asking really… Get me a list of all movies that Brad Pitt is part of the cast. Then, from that, get return all cast members of said movies. If THIS is the case, then I would start with querying all movies associated with Brad Pitt, THEN join again to the cast for others.

    select
          m.title,
          castName.Person_Name
       from
          person p
             join movie_cast mcBrad
                p.person_id = mcBrad.person_id
                join movie_cast AllCast
                   on mcBrad.movie_id = AllCast.movie_id
                   JOIN person castName
                      on AllCast.person_id = castName.person_id
                   JOIN movie m
                      on AllCast.movie_id = m.movie_id
       where
          p.person_name = 'Brad Pitt'
    

    So here, we are STARTING with the person filter of Brad Pitt joining to the cast where he is associated. This will return only one record per movie associated with Brad Pitt. Now that we have a movie, join AGAIN to the movie cast, but ONLY based on the movie qualified by Brad and getting ALL cast members, but for each individual movie. Now we have the cast members we can re-join back to the person table for each cast member in the movie. A second join to the movie table to pull the movie title.

    Hopefully a different perspective might be more of what you were intending?

    Login or Signup to reply.
  2. The graphical representation of the execution plan is not unreasonable, but I can understand why it might be confusing. If you look at the output of EXPLAIN ANALYZE you will see something like this:

    -> Nested loop left join
        -> Table scan on m
        -> Nested loop inner join
            -> Covering index lookup on m_cast using PRIMARY (movie_id=m.movie_id)
            -> Filter: (p_cast.person_name = 'Brad Pitt')
                -> Single-row index lookup on p_cast using PRIMARY (person_id=m_cast.person_id)
    

    Here we can see the left join between Table scan on m and the result of Nested loop inner join. The Nested loop inner join is performed in a loop for each row returned by Table scan on m with m.movie_id as an input to the inner join.

    More importantly, this highlights the lack of an index on p_cast.person_name and it is returning all rows from m_cast for the given movie_id, then retrieving the row from p_cast and filtering on person_name.

    If you add an index on the person_name you will probably see something like this:

    -> Nested loop left join
        -> Table scan on m
        -> Nested loop inner join
            -> Covering index lookup on p_cast using idx_name (person_name='Brad Pitt')
            -> Single-row covering index lookup on m_cast using PRIMARY (movie_id=m.movie_id, person_id=p_cast.person_id)
    

    Here we can see that both movie_id and person_id are used to lookup the row in m_cast.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search