⛰️ Hi comunity,
I’m using a normal MySQL query to join 2 tables.
My query in a simpler way looks like these (for these example i just put *, otherwise i have targeted elements)
SELECT table1.*, table2.*
FROM table1
INNER JOIN table2 ON table1.table1_id = table2.id
WHERE table1.user_id = 5
ORDER BY date DESC
These code is working fine for me.. but speaking with my CTO he explained me that JOIN is actually joining 2 tables with all the data of all users_ids and then is selecting my asked data in "WHERE table1.user_id = 5".
He asked me to improve the code because when our app reach 50.000 users the database will get bigger so these query will get a long time to load.
I tried to move "WHERE table1.user_id = 5" above the INNER JOIN to select first the info i need and then to join it. but didn’t work.
on these article explaining WHERE mysqltutorial.org/mysql-where/ they show a query structure different to mine (never saw for me).. I tried out but didn’t neither.
- Do you have any sugestions to improve my query?
- Do you have any articles to read ?
I’m really new to MySQL language 😅
Thanks
2
Answers
Your CTO is not correct on this.
The way rows are read in MySQL is kind of independent from the order of SQL clauses or conditions in them. The optimizer is smart enough to evaluate all the conditions, whether they occur in the
FROM
clause or theWHERE
clause, and apply the conditions at the best time. Moving a condition from one clause to the other does not (and should not) make a difference.By the way, the same is true of every other brand of SQL product I’ve used.
The most common way joins really work in MySQL is called a nested-loop join. The optimizer figures out which table to read first. This might not be the one you referenced first in your
FROM
clause. The inner join you used is commutative, soA JOIN B
gives the same result asB JOIN A
, and the optimizer knows this. It prefers to use the table with the fewest rows first. It loops over them, and for each row it looks up the matching rows in the other joined table.But it does know how to apply the condition in your
WHERE
clause before the nested-loop join, so it reduces the set of rows to those that match the condition. I.e. only rows whereuser_id=5
, which is I assume a small subset of the rows intable1
.Both the lookup of rows matching the
user_id=5
condition and the lookup of corresponding rows intable2
should be helped by indexes. Just like when you look up names in a telephone book, the fact that the index is pre-sorted helps you to find the matching entry very quickly, instead of having to read the book cover to cover.The lookup to
table2
is searching for corresponding values by itsid
column. I assume this is the primary key. If it is, then it’s indexed by default.So then the way to optimize this is to make sure there’s an index on
table1.user_id
, so the initial filtering can be done efficiently instead of having to scan every row in the table.If you have not already done so, do this:
If the table is large, it will take a little while (I can’t predict exactly how long), and may block queries while it’s building the index. So be aware that if you do such changes in production, you should be careful to pick a time when you can do it without interrupting service too badly. I recommend you try any new command in a development environment first until you’re confident with it.
You may also like my presentation How to Design Indexes, Really, or the video, or the chapter on indexing in my book: SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
I assume that table2 has
PRIMARY KEY(id)
— which serves as an index.Which table is
date
in? It makes a big difference.would benefit from `INDEX(user_id, date)
would benefit from `INDEX(user_id)
Without the index, there may be a lot of work going on. With a suitable index on each table, the query will be quite efficient.