skip to Main Content

I have a mysql database with multiple tables "InnoDB" with millions of row in it. Let say, The majority of interactions with tables are simple select/insert statements. But i have to make some Inner joints for statistics purpose.

I have two servers with Master/Slave replication. Master is for write only and slave for read to separate the queries manually without using a loadbalancer like ProxySQL … And it works fine for the simple queries.

Servers specs :

CPU : Intel Xeon E5-1630v3 - 4c/8t - 3.7 GHz/3.8 GHz 
RAM : 64 GB ECC 2133 MHz
Latency between the two servers is 1ms (measured)

The problem is when i run the statistic module, i got like 99% of CPU usages and queries take a lot of time. Ex : 1 INNER JOIN take 2 or 3 sec (With indexes) but when i launch a lot of them, it can take up to 90 sec.

Its clear to me that i have to add more servers and build an architecture around with a load balancer. but before i dive to this, can anyone suggest me a working solution for my situation please.

Thanks in advance.

2

Answers


  1. Chosen as BEST ANSWER

    Thank you for your reply @lemonina

    I wonder why my question has a -1 ...

    I run a Benchmark before a reinstall all with sysbench :

    sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=localhost --mysql-port=3306 --mysql-user=sysbench --mysql-password='sysbench' --mysql-db=sysbench --db-driver=mysql --tables=6 --table-size=30000000 prepare
    
    sysbench --db-driver=mysql --mysql-user=sysbench --mysql_password=sysbench --mysql-db=sysbench --mysql-host=127.0.0.1 --mysql-port=3306 --tables=6 --table-size=30000000 --threads=16 --time=300 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run
    

    here is the result :

    SQL statistics:
        queries performed:
            read:                            1039458
            write:                           296988
            other:                           148494
            total:                           1484940
        transactions:                        74247  (247.46 per sec.)
        queries:                             1484940 (4949.13 per sec.)
        ignored errors:                      0      (0.00 per sec.)
        reconnects:                          0      (0.00 per sec.)
    
    General statistics:
        total time:                          300.0391s
        total number of events:              74247
    
    Latency (ms):
             min:                                    5.08
             avg:                                   64.65
             max:                                  805.41
             95th percentile:                      170.48
             sum:                              4800305.37
    
    Threads fairness:
        events (avg/stddev):           4640.4375/18.04
        execution time (avg/stddev):   300.0191/0.00
    

    There was no specific configuration with this test.

    So made some changes based on what you suggest :

    • innodb_buffer_pool_size = 48G | Suggest between 60 and 80 % of RAM based on here

    • innodb_log_file_size = 10G | From Percona

    • innodb_thread_concurrency = 0 | From Link

    • innodb_flush_log_at_trx_commit=1 | From Link

    • sync_binlog=1 | From Link

    • innodb_file_per_table (NO) | From Link

    • For the caching, i don't think it will work because there is new data every second

    • SET GLOBAL innodb_redo_log_capacity = 8589934592; for Replica

    • Scaling horizontally with ProxySQL will be a great idea. I already tested this, but I dropped down this solution because the dispatching of the queries was slow. I found after a solution for this, so i think i will test this in a near future.

    • I also read about dividing the database between servers, I didn’t test it, but i have to figure out which tables i can put in the same server for my INNER JOINTs.

    Here is my new configuration :

    innodb_buffer_pool_size = 48G
    innodb_log_file_size = 10G
    innodb_thread_concurrency = 0
    innodb_flush_log_at_trx_commit=1
    sync_binlog=1
    
    max_connections = 10000
    wait_timeout = 300
    interactive_timeout = 10
    connect_timeout = 60
    binlog_expire_logs_seconds = 86400
    
    server-id = 1
    log_bin = /var/log/mysql/mysql-bin.log
    max_binlog_size = 1G
    slow_query_log = 1
    

    And the result from the same test :

    SQL statistics:
        queries performed:
            read:                            7376012
            write:                           2107432
            other:                           1053716
            total:                           10537160
        transactions:                        526858 (1756.07 per sec.)
        queries:                             10537160 (35121.32 per sec.)
        ignored errors:                      0      (0.00 per sec.)
        reconnects:                          0      (0.00 per sec.)
    
    General statistics:
        total time:                          300.0205s
        total number of events:              526858
    
    Latency (ms):
             min:                                    2.76
             avg:                                    9.11
             max:                                  116.71
             95th percentile:                       14.46
             sum:                              4799015.34
    
    Threads fairness:
        events (avg/stddev):           32928.6250/51.95
        execution time (avg/stddev):   299.9385/0.00
    

    I'm not sure if the reinstall of the 100 Go database can impact the performance obtained or the configuration did.

    What do you think ?


  2. here are some suggestions to improve your performance:

    • check query execution plan — identify any potential bottlenecks in the query
    • make sure that you have appropriate indexes for the columns used in JOIN
    • optimize database configuration — check your config file (my.cnf) and adjust it for the hardware configuration and workload (parameters such as "innodb_buffer_pool_size", "innodb_log_file_size", "innodb_flush_log_at_trx_commit", "innodb_thread_concurrency", etc)
    • use caching — caching layer using tools like Redis or Memcached to cache frequently accessed data and reduce the load on the DB
    • partitioning — consider partitioning large tables with millions of rows based on date|geographic location
    • optimize the schema — normalize the tables to reduce redundancy + use appropriate data types for your columns
    • upgrade your hardware — higher number of CPU cores – faster disk I/O – more RAM

    as you have already identified, scaling horizontally with a load balancer may be the best long-term solution to handle increasing loads

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search