skip to Main Content

I have a SQL query which works but is slow. I wondered if there is a more efficient way to express it using joins.

Scenario:

Tables: productions, scripts and a join table, productions_scripts

productions
id (int)
production (text)
scripts
id (int)
script (text)
canonical (bool)
productions_scripts
id (int)
production_id (int)
script_id (int)

Requirement:

Return a listing of productions and their associated scripts where the production has more than one script of type ‘canonical’. Only show results where the script is canonical.

Current query:

SELECT productions.id AS production_id, productions.production,
       scripts.id AS script_id, scripts.script
FROM scripts, productions, productions_scripts
WHERE productions.id IN (SELECT productions_scripts.production_id
                         FROM productions_scripts, scripts
                         WHERE scripts.id = productions_scripts.script_id
                           AND scripts.canonical = 1
                         GROUP BY production_id
                         HAVING COUNT(production_id) > 1
                         )
AND productions.id = productions_scripts.production_id
AND scripts.id = productions_scripts.script_id
AND scripts.canonical = 1
ORDER BY production_id;

Issues:
The query works but takes a long time to run. (~45 secs.)

The main difficulty I encountered in writing the query was getting a count of productions with multiple canonical scripts, while having the requirement to output a row for every matching production-script combination rather than just unique values for production_id.

It seems that I have to use GROUP BY production_id to get the count. But this also results in the output of unique values for production_id. Hence the subquery.

EDIT.
The query I posted was the wrong version – it returned productions with non-canonical scripts. I have updated it with the AND scripts.canonical = 1 clause in the final section.

3

Answers


  1. You can use a window function instead of the subquery.

    SELECT
      p.production_id,
      p.production,
      p.script_id,
      p.script
    FROM (
        SELECT
          p.id AS production_id,
          p.production,
          s.id AS script_id,
          s.script,
          COUNT(*) OVER (PARTITION BY p.id) AS countCanonical
        FROM scripts s
        JOIN productions_scripts ps
          ON s.id = ps.script_id
        JOIN productions p
          ON p.id = ps.production_id
        WHERE s.canonical = 1
    ) p
    WHERE p.countCanonical > 1
    ORDER BY production_id;
    

    Note that you should use explicit join syntax, rather than comma=joins.

    Login or Signup to reply.
  2. Having appropriate indexes will definitely help and as basic as these tables are, I would just show and you can confirm

    Table               Index
    production          (id, production)
    script              (id, canonical, script)
    productions_scripts ( production_id, script_id )
    

    Next, your sub-query was on target, but as other person noted, use explicit joins vs comma list of tables.

    Lastly, as this is from MySQL, I would through in the keyword "STRAIGHT_JOIN" which tells MySql to do the query in the order I have listed, dont think for me. I dont know your data table sizes (records), but from a system I worked with years ago with 20+ million records linked to 25+ lookup tables for corresponding descriptions, that one keyword took a query from crashing the server to having it return result in less than 2 hours.

    Having said that, I would revise to:

    select STRAIGHT_JOIN
            p.id production_id,
            p.production,
            s.id script_id,
            s.script,
            s.canonical
        from
            (SELECT 
                    ps.production_id
                FROM 
                    productions_scripts ps
                        JOIN scripts s
                            on ps.script_id = s.id
                            AND s.canonical = 1
                group by
                    ps.production_id
                having
                    count(*) > 1 ) multi
                JOIN production_scripts ps2
                    on multi.production_id = ps2.production_id
                    JOIN production p
                        on ps2.production_id = p.id
                    JOIN scripts s
                        on ps2.script_id = s.id
        order by
            multi.production_id
    

    The inner query relies on itself joined to the scripts for canonical only having the count of any record matching qualified > 1. Then, immediately take that production ID qualified to re-join to the rest of the tables to pull the respective details.

    Login or Signup to reply.
  3. Here is a way to do it using multiple INNER JOIN clauses :

    SELECT p.id AS production_id, p.production, s.id AS script_id, s.script
    FROM productions p 
    INNER JOIN productions_scripts ps ON p.id = ps.production_id
    INNER JOIN scripts s ON s.id = ps.script_id
    INNER JOIN (
      SELECT ps.production_id
      FROM productions_scripts ps
      INNER JOIN scripts s ON s.id = ps.script_id
      WHERE s.canonical = 1
      GROUP BY ps.production_id
      HAVING COUNT(ps.production_id) > 1
    ) AS t ON t.production_id = p.id
    WHERE s.canonical = 1
    ORDER BY p.id;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search