I have a SQL query which works but is slow. I wondered if there is a more efficient way to express it using joins.
Scenario:
Tables: productions
, scripts
and a join table, productions_scripts
productions |
---|
id (int) |
production (text) |
scripts |
---|
id (int) |
script (text) |
canonical (bool) |
productions_scripts |
---|
id (int) |
production_id (int) |
script_id (int) |
Requirement:
Return a listing of productions and their associated scripts where the production has more than one script of type ‘canonical’. Only show results where the script is canonical.
Current query:
SELECT productions.id AS production_id, productions.production,
scripts.id AS script_id, scripts.script
FROM scripts, productions, productions_scripts
WHERE productions.id IN (SELECT productions_scripts.production_id
FROM productions_scripts, scripts
WHERE scripts.id = productions_scripts.script_id
AND scripts.canonical = 1
GROUP BY production_id
HAVING COUNT(production_id) > 1
)
AND productions.id = productions_scripts.production_id
AND scripts.id = productions_scripts.script_id
AND scripts.canonical = 1
ORDER BY production_id;
Issues:
The query works but takes a long time to run. (~45 secs.)
The main difficulty I encountered in writing the query was getting a count of productions with multiple canonical scripts, while having the requirement to output a row for every matching production-script combination rather than just unique values for production_id
.
It seems that I have to use GROUP BY production_id
to get the count. But this also results in the output of unique values for production_id
. Hence the subquery.
EDIT.
The query I posted was the wrong version – it returned productions with non-canonical scripts. I have updated it with the AND scripts.canonical = 1
clause in the final section.
3
Answers
You can use a window function instead of the subquery.
Note that you should use explicit join syntax, rather than comma=joins.
Having appropriate indexes will definitely help and as basic as these tables are, I would just show and you can confirm
Next, your sub-query was on target, but as other person noted, use explicit joins vs comma list of tables.
Lastly, as this is from MySQL, I would through in the keyword "STRAIGHT_JOIN" which tells MySql to do the query in the order I have listed, dont think for me. I dont know your data table sizes (records), but from a system I worked with years ago with 20+ million records linked to 25+ lookup tables for corresponding descriptions, that one keyword took a query from crashing the server to having it return result in less than 2 hours.
Having said that, I would revise to:
The inner query relies on itself joined to the scripts for canonical only having the count of any record matching qualified > 1. Then, immediately take that production ID qualified to re-join to the rest of the tables to pull the respective details.
Here is a way to do it using multiple
INNER JOIN
clauses :