We are using Mysql EntityMapping table with 140cr rows, each entity mapped with 1 or 2 million contacts rows, it is growing in last 5 months.
The table contains PK, entityId (Index column), contactId (index column), Added time(index column), remaining fields which is related to the table.
It is possible to grow 300cr in next 6 months. and current table size is 750 GB.
So, my question is whether split table it is a good practice in any situation. My theory is to push to have them combined as it will make things easier as far as UNIONs, etc go. and planned to make one meta table like entityTableMapping with entityId and Table names, if we querying in specific entity, we can directly fetch from specific table. The main downside is changing all the application code and whether it is even worth it in the long run.
2
Answers
You’ll have to solve that before you can make any changes. That’s a different question.
Partitioning the table would help. The table is split into multiple tables, but is still presented as a single table.
You can partition by month or year using a timestamp column, such as the last time the data was accessed or updated. It allows you to quickly drop old data by simply dropping old partitions. It might speed up queries which are looking for a particular range of dates if that range falls into a single partition. It also might speed up queries across a large range of dates as the query can be done in parallel across many partitions.
Along with dropping old data, you may wish to consider adding a data warehouse. When you’re approaching a terabyte of data, a lot of that isn’t for daily use. Rather it’s for things like analytics, statistics, and business intelligence. In that case you can move that data to a data warehouse. This allows you to store only the data necessary for moment-to-moment queries in MySQL, while the bulk of your data is in the warehouse database specialized for statistical queries.
Please provide
SHOW CREATE TABLE
and the important queries. Also let us know if you will be deleting "old" data from the table.Off hand,
PARTITION BY RANGE(TO_DAYS(...))
may be useful. (But aim for between 20 and 50 partitions. Off hand, monthly partitions sounds ‘right’.) See PartitionIf you do need to add indexes and/or partition the table, do it soon. The longer you wait, the slower the conversion will be.
Do not split manually into tables and do UNIONs; that will be messier than using
PARTITIONing
and probably slower.See if you can ‘normalize’ some of the columns. This and other techniques will shrink the bulkiness of the table — thereby helping with disk space and maybe with speed.
How much RAM do you have? What is the setting of
innodb_buffer_pool_size.
?Is the "mapping" table a many-to-many mapping? If so, it is important for speed and space to follow the guidelines in Many-to-many
1.4 billion rows; soon to be 4.4 billion? That is a lot. Consider my advice carefully, else you will soon be in ‘deep weeds’.