skip to Main Content

Assuming that I have 20L records,
Approach 1: Hold all 20L records in a single table.
Approach 2: Make 20 tables and enter 1L into each.
Which is the best method to increase performance and why, or are there any other approaches?

3

Answers


  1. The best/easiest is to have a unique table with proper indexes. On 100K lines I had 30s / query, but with an index I got 0.03s / query.

    When it doesn’t fit anymore you split tables (for me it’s when I got to millions of lines).

    And preferably on different servers.
    You can then create a microservice accessing all servers and returning data to consumers like if there was only one database.

    But once you do this you better not have joins, because it’ll get messy replicating data on every databases.

    I would stick to the first method.

    Login or Signup to reply.
  2. When looking for read time performance, indexes are a great way to improve the performance. However, having indexes can slow down the write time queries.

    So if you are looking for a read time performance, prefer indexes.
    Few things to keep in mind when creating the index

    • Try to avoid null values in the index
    • Cardinality of the columns matter. It’s been observed that having a column with lower cardinality first gives better performance when compared to a column with higher cardinality
    • Sequence of the columns in index should match your where clause. For ex. you create a index on Col A and Col B but query on Col C, your index would not be used. So formulate your indexes according to your where clauses.

    When in doubt if an index was used or not, use EXPLAIN to see which index was used.

    DB indexes can be a tricky subject for the beginners but imagining it as a tree traversal helps visualize the path traced when reading the data.

    Login or Signup to reply.
  3. Splitting a large table into smaller ones can give better performance — it is called sharding when the tables are then distributed across multiple database servers — but when you do it manually it is most definitely an antipattern.

    What happens if you have 100 tables and you are looking for a row but you don’t know which table has it? If you put index on the tables you’ll need to do it 100 times. If somebody wants to join the data set he might need to include 100 tables in his join in some use cases. You’d need to invent your own naming conventions, document and enforce them yourself with no help from the database catalog. Backup and recovery and all the other maintenance tasks will be a nightmare….just don’t do it.

    Instead just break up the table by partitioning it. You get 100% of the performance improvement that you would have gotten from multiple tables but now the database is handling the details for you.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search