skip to Main Content

I’m looking for efficient strategies to optimize a MySQL query that generates Cartesian products in scenarios involving large data sets. The specific use case involves pairing cast and crew members for each movie in a database, which can result in a very large number of combinations due to the size of the data involved.

The current query is as follows:

SELECT 
    m.title,
    pc.person_name AS cast_member,
    pr.person_name AS crew_member
FROM 
    movie m
JOIN 
    movie_cast mc ON m.movie_id = mc.movie_id
JOIN 
    person pc ON mc.person_id = pc.person_id
JOIN 
    movie_crew mcc ON m.movie_id = mcc.movie_id
JOIN 
    person pr ON mcc.person_id = pr.person_id;

Considering the following data:

  movies(movie_id, title): (1,'m1'), (2,'m2')
  person(person_id, name): (1, 'p1'), (2, 'p2'), (3, 'p3'), (4,'p4')
  movie_cast(movie_id, peson_id): (1,1),(1,2),(2,1),(2,3)
  movie_crew(movie_id, peson_id): (1,1),(2,2), (2,4)

the result should be:

movie, cast, crew
m1      p1     p1
m1      p2     p1
m2      p1     p2
m2      p1     p4
m2      p3     p2
m2      p3     p4 

I’ve considered using derived tables with the LATERAL clause to make this more efficient, but it led to inaccurate results. Are there other optimization techniques or approaches in MySQL that could improve the performance of this query?

Please note that this query is used purely for research purposes.

2

Answers


  1. Main idea: if person is cast member and crew member related to the movie necessary join to one row for this movie (subquery pcc– person-cast-crew).
    Person is on booth(cast and crew) or one(cast/crew) side of query result.

    movie_id cast_id crew_id
    1 10 10
    1 11 null
    2 11 11
    2 12 null
    1 null 12
    2 null 13

    See example

    select m.movie_id,title
      ,m1.person_name cast_member
      ,m2.person_name crew_member
    from movie m
    left join 
      (
      select movie_id
        ,min(case when pt='cast' then person_id end) cast_id
        ,min(case when pt='crew' then person_id end) crew_id
      from(
         select movie_id,person_id, 'cast' as pt from movie_cast
         union all
         select movie_id,person_id, 'crew' as pt from movie_crew
      )cast_and_crew
      group by movie_id,person_id
    )pcc on pcc.movie_id=m.movie_id
    left join person m1 on m1.person_id=pcc.cast_id
    left join person m2 on m2.person_id=pcc.crew_id
    order by m.movie_id,coalesce(m1.person_id,m2.person_id)
    

    Fiddle here

    OR

    select m.movie_id,title
      ,case when pcc.cast_id is not null then p.person_name end cast_member
      ,case when pcc.crew_id is not null then p.person_name end crew_member
    from movie m
    left join 
      (
      select movie_id
        ,min(case when pt='cast' then person_id end) cast_id
        ,min(case when pt='crew' then person_id end) crew_id
      from(
        select movie_id,person_id, 'cast' as pt from movie_cast
        union all
        select movie_id,person_id, 'crew' as pt from movie_crew
       )cast_and_crew
      group by movie_id,person_id
    )pcc on pcc.movie_id=m.movie_id
    left join person p on p.person_id=coalesce(pcc.cast_id,pcc.crew_id)
    order by m.movie_id,p.person_name
    

    Demo

    Test data

    create table movie (movie_id int,title varchar(50));
    create table movie_cast(movie_id int,person_id int);
    create table movie_crew(movie_id int,person_id int);
    create table person(person_id int,person_name varchar(50));
    insert into movie values
     (1,'Movie 1')
    ,(2,'Movie 2')
    ;
    insert into person values
     (10,'Carlson')
    ,(11,'Peterson')
    ,(12,'Henderson')
    ,(13,'Magnuson')
    ;
    insert into movie_cast values
     (1,10)
    ,(1,11)
    ,(2,11)
    ,(2,12)
    ;
    insert into movie_crew values
     (1,10)
    ,(1,12)
    ,(2,11)
    ,(2,13)
    ;
    
    Login or Signup to reply.
  2. CREATE TABLE movie_cast (
        -- no id
        movie_id ...,
        person_id ...,
        PRIMARY KEY(movie_id, person_id),
        INDEX(person_id, movie_id)
    ) ENGINE = InnoDB;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search