I wrote a SQL query to fetch all movies and, if present, include the name ‘Brad Pitt’ for movies where he’s a cast member. The query is as follows:
select m.title, p_cast.person_name
From movie m left join (
movie_cast m_cast join person p_cast on m_cast.person_id = p_cast.person_id and
p_cast.person_name = 'Brad Pitt'
) on m.movie_id = m_cast.movie_id
I expected the execution plan in MySQL Workbench to show that the database left-joins the movie table with the result of the join between movie_cast and person. However, the plan suggests a different sequence: it appears to join movie with movie_cast first, and only then does it join with person where the filter for ‘Brad Pitt’ is applied. Here is the plan.
.
If this interpretation of the execution plan is accurate, I’m trying to understand where the left join is actually being implemented. If the join sequence is as described, I expected that rows including cast members other than Brad Pitt would be retained, which is not the case.
Does the execution plan really represent what’s happening internally (and if so, how to make sense of the execution flow?), or is there a potential error in how it’s being displayed or interpreted?
2
Answers
Simplified version of your query. No parenthesis required. Do the LEFT JOIN conditions at each level from movie to cast to person
Now, this result might give you more than you expect as the left-join to the cast will get 1 record for Every cast member, but only if the left-join to the person qualified for ‘Brad Pitt’ will pull the actual name. So you would get a list the same moving showing and bunch of null values for the person UNTIL Brad Pitt.
Now, lets add something to see your intent vs what your query is. So lets see if THIS makes more sense… Is what you are asking really… Get me a list of all movies that Brad Pitt is part of the cast. Then, from that, get return all cast members of said movies. If THIS is the case, then I would start with querying all movies associated with Brad Pitt, THEN join again to the cast for others.
So here, we are STARTING with the person filter of Brad Pitt joining to the cast where he is associated. This will return only one record per movie associated with Brad Pitt. Now that we have a movie, join AGAIN to the movie cast, but ONLY based on the movie qualified by Brad and getting ALL cast members, but for each individual movie. Now we have the cast members we can re-join back to the person table for each cast member in the movie. A second join to the movie table to pull the movie title.
Hopefully a different perspective might be more of what you were intending?
The graphical representation of the execution plan is not unreasonable, but I can understand why it might be confusing. If you look at the output of
EXPLAIN ANALYZE
you will see something like this:Here we can see the left join between
Table scan on m
and the result ofNested loop inner join
. TheNested loop inner join
is performed in a loop for each row returned byTable scan on m
withm.movie_id
as an input to the inner join.More importantly, this highlights the lack of an index on
p_cast.person_name
and it is returning all rows fromm_cast
for the givenmovie_id
, then retrieving the row fromp_cast
and filtering onperson_name
.If you add an index on the
person_name
you will probably see something like this:Here we can see that both
movie_id
andperson_id
are used to lookup the row inm_cast
.