Assuming that I have 20L records,
Approach 1: Hold all 20L records in a single table.
Approach 2: Make 20 tables and enter 1L into each.
Which is the best method to increase performance and why, or are there any other approaches?
Assuming that I have 20L records,
Approach 1: Hold all 20L records in a single table.
Approach 2: Make 20 tables and enter 1L into each.
Which is the best method to increase performance and why, or are there any other approaches?
3
Answers
The best/easiest is to have a unique table with proper indexes. On 100K lines I had 30s / query, but with an index I got 0.03s / query.
When it doesn’t fit anymore you split tables (for me it’s when I got to millions of lines).
And preferably on different servers.
You can then create a microservice accessing all servers and returning data to consumers like if there was only one database.
But once you do this you better not have joins, because it’ll get messy replicating data on every databases.
I would stick to the first method.
When looking for read time performance, indexes are a great way to improve the performance. However, having indexes can slow down the write time queries.
So if you are looking for a read time performance, prefer indexes.
Few things to keep in mind when creating the index
When in doubt if an index was used or not, use EXPLAIN to see which index was used.
DB indexes can be a tricky subject for the beginners but imagining it as a tree traversal helps visualize the path traced when reading the data.
Splitting a large table into smaller ones can give better performance — it is called sharding when the tables are then distributed across multiple database servers — but when you do it manually it is most definitely an antipattern.
What happens if you have 100 tables and you are looking for a row but you don’t know which table has it? If you put index on the tables you’ll need to do it 100 times. If somebody wants to join the data set he might need to include 100 tables in his join in some use cases. You’d need to invent your own naming conventions, document and enforce them yourself with no help from the database catalog. Backup and recovery and all the other maintenance tasks will be a nightmare….just don’t do it.
Instead just break up the table by partitioning it. You get 100% of the performance improvement that you would have gotten from multiple tables but now the database is handling the details for you.