Multiple Inner Join from tables with millions of rows - Mysql

Moho
April 3, 2023
146 views
1 vote
2 Answers

I have a mysql database with multiple tables "InnoDB" with millions of row in it. Let say, The majority of interactions with tables are simple select/insert statements. But i have to make some Inner joints for statistics purpose.

I have two servers with Master/Slave replication. Master is for write only and slave for read to separate the queries manually without using a loadbalancer like ProxySQL … And it works fine for the simple queries.

Servers specs :

CPU : Intel Xeon E5-1630v3 - 4c/8t - 3.7 GHz/3.8 GHz 
RAM : 64 GB ECC 2133 MHz
Latency between the two servers is 1ms (measured)

The problem is when i run the statistic module, i got like 99% of CPU usages and queries take a lot of time. Ex : 1 INNER JOIN take 2 or 3 sec (With indexes) but when i launch a lot of them, it can take up to 90 sec.

Its clear to me that i have to add more servers and build an architecture around with a load balancer. but before i dive to this, can anyone suggest me a working solution for my situation please.

Thanks in advance.

Tags: mysql

Answers

Chosen as BEST ANSWER

Thank you for your reply @lemonina

I wonder why my question has a -1 ...

I run a Benchmark before a reinstall all with sysbench :

sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=localhost --mysql-port=3306 --mysql-user=sysbench --mysql-password='sysbench' --mysql-db=sysbench --db-driver=mysql --tables=6 --table-size=30000000 prepare

sysbench --db-driver=mysql --mysql-user=sysbench --mysql_password=sysbench --mysql-db=sysbench --mysql-host=127.0.0.1 --mysql-port=3306 --tables=6 --table-size=30000000 --threads=16 --time=300 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run

here is the result :

SQL statistics:
    queries performed:
        read:                            1039458
        write:                           296988
        other:                           148494
        total:                           1484940
    transactions:                        74247  (247.46 per sec.)
    queries:                             1484940 (4949.13 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0391s
    total number of events:              74247

Latency (ms):
         min:                                    5.08
         avg:                                   64.65
         max:                                  805.41
         95th percentile:                      170.48
         sum:                              4800305.37

Threads fairness:
    events (avg/stddev):           4640.4375/18.04
    execution time (avg/stddev):   300.0191/0.00

There was no specific configuration with this test.

So made some changes based on what you suggest :

innodb_buffer_pool_size = 48G | Suggest between 60 and 80 % of RAM based on here
innodb_log_file_size = 10G | From Percona
innodb_thread_concurrency = 0 | From Link
innodb_flush_log_at_trx_commit=1 | From Link
sync_binlog=1 | From Link
innodb_file_per_table (NO) | From Link
For the caching, i don't think it will work because there is new data every second
SET GLOBAL innodb_redo_log_capacity = 8589934592; for Replica
Scaling horizontally with ProxySQL will be a great idea. I already tested this, but I dropped down this solution because the dispatching of the queries was slow. I found after a solution for this, so i think i will test this in a near future.
I also read about dividing the database between servers, I didn’t test it, but i have to figure out which tables i can put in the same server for my INNER JOINTs.

Here is my new configuration :

innodb_buffer_pool_size = 48G
innodb_log_file_size = 10G
innodb_thread_concurrency = 0
innodb_flush_log_at_trx_commit=1
sync_binlog=1

max_connections = 10000
wait_timeout = 300
interactive_timeout = 10
connect_timeout = 60
binlog_expire_logs_seconds = 86400

server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
max_binlog_size = 1G
slow_query_log = 1

And the result from the same test :

SQL statistics:
    queries performed:
        read:                            7376012
        write:                           2107432
        other:                           1053716
        total:                           10537160
    transactions:                        526858 (1756.07 per sec.)
    queries:                             10537160 (35121.32 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0205s
    total number of events:              526858

Latency (ms):
         min:                                    2.76
         avg:                                    9.11
         max:                                  116.71
         95th percentile:                       14.46
         sum:                              4799015.34

Threads fairness:
    events (avg/stddev):           32928.6250/51.95
    execution time (avg/stddev):   299.9385/0.00

I'm not sure if the reinstall of the 100 Go database can impact the performance obtained or the configuration did.

What do you think ?

(Edit)

- Lemonina
- April 3, 2023 at 6:07 am
- 0 votes
0
here are some suggestions to improve your performance:
- check query execution plan — identify any potential bottlenecks in the query
- make sure that you have appropriate indexes for the columns used in JOIN
- optimize database configuration — check your config file (my.cnf) and adjust it for the hardware configuration and workload (parameters such as "innodb_buffer_pool_size", "innodb_log_file_size", "innodb_flush_log_at_trx_commit", "innodb_thread_concurrency", etc)
- use caching — caching layer using tools like Redis or Memcached to cache frequently accessed data and reduce the load on the DB
- partitioning — consider partitioning large tables with millions of rows based on date|geographic location
- optimize the schema — normalize the tables to reduce redundancy + use appropriate data types for your columns
- upgrade your hardware — higher number of CPU cores – faster disk I/O – more RAM
as you have already identified, scaling horizontally with a load balancer may be the best long-term solution to handle increasing loads
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Multiple Inner Join from tables with millions of rows – Mysql

Answers