How important would indexing be on this large MySQL table with 50k users per day?

user3304303
October 11, 2023
88 views
1 vote
2 Answers

I have a site that gets a lot of traffic this time of year (around 50,000 users a day). The traffic goes up every year and all the sites on my server are starting to crash more, apparently because of non-optimized MySQL queries.

The worst offender I found in a slow log query is this one below. This particular table has 2,655,197 rows.

Would this be classified as a slight problem, big problem or maybe a horrific/fix-this-immediately problem?

I have no experience with "indexing" but it seems after doing some googling that I should create an index for this query. Would that help significantly? If so and if anyone can recommend any sites/videos that explain how to create indexes well, please share.

Count: 798 Time=5.75s (4589s) Lock=0.24s (189s) Rows_sent=0.9
(739), Rows_examined=2592353.9 (2068698439), Rows_affected=0.0 (0),
retracted@localhost SELECT * from TABLE WHERE unique_id= ‘S’ LIMIT 1

Answers

- RickJames
- October 11, 2023 at 6:23 pm
- 0 votes
0
Once you add INDEX(unique_id), the time for that simple query will go down from 4589s to 0.01s.

Is that enough incentive to learn about indexing?

50K users/day is 1/sec — not very busy. 2M rows — medium sized. But they act big because of the lack of indexing.

If unique_id is "unique" then use UNIQUE or PRIMARY KEY instead of simply INDEX. Also, if it is unique, then the LIMIT 1 is unnecessary.

Login or Signup to reply.

- BillKarwin
- October 11, 2023 at 6:26 pm
- 0 votes
0
Would this be classified as a slight problem, big problem or maybe a horrific/fix-this-immediately problem?

That’s not something anyone can answer for you. How much do you need the query to run optimally? We don’t know that better than you do.

I made a presentation that has helped a lot of people here: How to Design Indexes, Really There’s a video recording here: https://www.youtube.com/watch?v=ELR7-RdU9XU This is from 2012, but the principles are pretty evergreen.

Think about the telephone book analogy I use in my presentation: isn’t it a great advantage that the telephone book is sorted? It makes it so you can look up entries much faster.

How much faster?

Computer scientists describe performance in terms of how many steps does it take to do a task (like searching a dataset) relative to the size of the dataset.

If the telephone book were not sorted, the number of steps is directly proportional to the size of the dataset. If there are N entries in the book, it might take you up to N lookups to find the one you’re looking for (it’s always in the last place you look, right?).

But since the telephone book is sorted, a search can be done by starting in the middle, and seeing that the entry at that position is either greater or lesser than the one you’re looking for. So you know the one you want is either in the first half or the second half. Then you go to the midpoint of that half and try again. Each lookup reduces the search by half again. This is called binary subdivision.

Instead of taking a maximum of N steps to search a dataset of N entries, binary subdivision takes log₂N steps. This is a game-changer, and it is even more of an advantage the larger your dataset gets.

So yes, indexing is an important thing to pay attention to if you want your queries to be optimized, and to keep performance good as your dataset grows (they always seem to grow, don’t they?).

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.