skip to Main Content

⛰️ Hi comunity,

I’m using a normal MySQL query to join 2 tables.

My query in a simpler way looks like these (for these example i just put *, otherwise i have targeted elements)

SELECT table1.*, table2.*     
FROM table1
INNER JOIN table2 ON table1.table1_id = table2.id
WHERE table1.user_id = 5
ORDER BY date DESC

These code is working fine for me.. but speaking with my CTO he explained me that JOIN is actually joining 2 tables with all the data of all users_ids and then is selecting my asked data in "WHERE table1.user_id = 5".

He asked me to improve the code because when our app reach 50.000 users the database will get bigger so these query will get a long time to load.

I tried to move "WHERE table1.user_id = 5" above the INNER JOIN to select first the info i need and then to join it. but didn’t work.

on these article explaining WHERE mysqltutorial.org/mysql-where/ they show a query structure different to mine (never saw for me).. I tried out but didn’t neither.

enter image description here

  • Do you have any sugestions to improve my query?
  • Do you have any articles to read ?

I’m really new to MySQL language 😅

Thanks

2

Answers


  1. Your CTO is not correct on this.

    The way rows are read in MySQL is kind of independent from the order of SQL clauses or conditions in them. The optimizer is smart enough to evaluate all the conditions, whether they occur in the FROM clause or the WHERE clause, and apply the conditions at the best time. Moving a condition from one clause to the other does not (and should not) make a difference.

    By the way, the same is true of every other brand of SQL product I’ve used.

    The most common way joins really work in MySQL is called a nested-loop join. The optimizer figures out which table to read first. This might not be the one you referenced first in your FROM clause. The inner join you used is commutative, so A JOIN B gives the same result as B JOIN A, and the optimizer knows this. It prefers to use the table with the fewest rows first. It loops over them, and for each row it looks up the matching rows in the other joined table.

    But it does know how to apply the condition in your WHERE clause before the nested-loop join, so it reduces the set of rows to those that match the condition. I.e. only rows where user_id=5, which is I assume a small subset of the rows in table1.

    Both the lookup of rows matching the user_id=5 condition and the lookup of corresponding rows in table2 should be helped by indexes. Just like when you look up names in a telephone book, the fact that the index is pre-sorted helps you to find the matching entry very quickly, instead of having to read the book cover to cover.

    The lookup to table2 is searching for corresponding values by its id column. I assume this is the primary key. If it is, then it’s indexed by default.

    So then the way to optimize this is to make sure there’s an index on table1.user_id, so the initial filtering can be done efficiently instead of having to scan every row in the table.

    If you have not already done so, do this:

    ALTER TABLE table1 ADD INDEX (user_id);
    

    If the table is large, it will take a little while (I can’t predict exactly how long), and may block queries while it’s building the index. So be aware that if you do such changes in production, you should be careful to pick a time when you can do it without interrupting service too badly. I recommend you try any new command in a development environment first until you’re confident with it.

    You may also like my presentation How to Design Indexes, Really, or the video, or the chapter on indexing in my book: SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

    Login or Signup to reply.
  2. I assume that table2 has PRIMARY KEY(id) — which serves as an index.

    Which table is date in? It makes a big difference.

    WHERE    table1.user_id = 5
    ORDER BY table1.date DESC
    

    would benefit from `INDEX(user_id, date)

    WHERE    table1.user_id = 5
    ORDER BY table2.date DESC
    

    would benefit from `INDEX(user_id)

    Without the index, there may be a lot of work going on. With a suitable index on each table, the query will be quite efficient.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search