I’m looking for efficient strategies to optimize a MySQL query that generates Cartesian products in scenarios involving large data sets. The specific use case involves pairing cast and crew members for each movie in a database, which can result in a very large number of combinations due to the size of the data involved.
The current query is as follows:
SELECT
m.title,
pc.person_name AS cast_member,
pr.person_name AS crew_member
FROM
movie m
JOIN
movie_cast mc ON m.movie_id = mc.movie_id
JOIN
person pc ON mc.person_id = pc.person_id
JOIN
movie_crew mcc ON m.movie_id = mcc.movie_id
JOIN
person pr ON mcc.person_id = pr.person_id;
Considering the following data:
movies(movie_id, title): (1,'m1'), (2,'m2')
person(person_id, name): (1, 'p1'), (2, 'p2'), (3, 'p3'), (4,'p4')
movie_cast(movie_id, peson_id): (1,1),(1,2),(2,1),(2,3)
movie_crew(movie_id, peson_id): (1,1),(2,2), (2,4)
the result should be:
movie, cast, crew
m1 p1 p1
m1 p2 p1
m2 p1 p2
m2 p1 p4
m2 p3 p2
m2 p3 p4
I’ve considered using derived tables with the LATERAL clause to make this more efficient, but it led to inaccurate results. Are there other optimization techniques or approaches in MySQL that could improve the performance of this query?
Please note that this query is used purely for research purposes.
2
Answers
Main idea: if person is cast member and crew member related to the movie necessary join to one row for this movie (subquery
pcc
– person-cast-crew).Person is on booth(cast and crew) or one(cast/crew) side of query result.
See example
Fiddle here
OR
Demo
Test data