skip to Main Content

I have developed JSON API in the slim framework of PHP. Using this API I send contact numbers from my mobile device to the server. The server saves these contacts in the MySQL database.

I hosted the API using digital ocean hosting with 4 core CPU, 8 GB ram. In order to stress test, I sent 1000 concurrent POST requests to the API with 500 contact numbers in the body of each request and suddenly server got too much busy processing those requests and it stopped taking any further requests. I opened APACHE error.log and I saw this message there:

[Fri May 15 13:34:14.112205 2020] [mpm_prefork:error] [pid 42748] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

After seeing these logs I looked at the server control panel stats which are offered by the digital ocean and I saw CPU was used 64%, Memory 16% but Disk I/O was on its peak. After seeing this I came to know that there is something in my PHP code that is taking CPU time and from Disk I/O usage the first thing that came in my mind it was the number of queries that I am running in for loop. I searched about it and came to now that running a query in a loop is a bad practice.

It’s been two days and I am unable to change database design to avoid those queries in the for loop. As I have provided the problem background now let me explain how I am saving those contacts in the database.

First here is my database ERD.

ERD

From Request body, I extract JSON array of contacts and loop on that contacts array. Then for each contact, I check if it already exists in my database using this query.

Select id from contact where number = 'mobile number' and full_name = 'name' and country_id = 12;

I have created index on number full_name country_id in database. if the contact exists this query return me the id of that contact otherwise it returns FALSE. If I get the id I create a record in contact_seeder table if it does not exist. to check if it already exists I query contact_seeder table like this.

Select id from contact_seeder where contact_id = 12 and seeder_id = 2;

A contact can have only 1 ‘contact_seeder’ record against 1 seeder_id. That’s why I have checked if ‘contact_seeder’ exist against some contact_id and seeder_id.

As you can see I have to check if the record already exists and if yes then gets it’s id otherwise create a new record. So in order to insert ONE contact, I am running 4 queries in worst-case scenario and in best case scenario it would 3, and yes these queries in a loop. So for 100 contacts, it would be. 4*100 = 400 queries. That’s what is slowing down my server.

I hope i have provided enough details for you to understand my problem please suggest a solution to how I can avoid those queries in the loop and sill achieve my target goal. Before mentioning bulk insertion please check I need id of each record that is created for example contact. In bulk insertion, it is another problem to get the ids of created records.

2

Answers


  1. I have two solutions in mind.
    First is LEFT JOIN on your first select

    "SELECT
        `contact`.`id`,
        `contact_seeder`.`id` as `seeder_id`
    FROM
        `contact`
        LEFT JOIN `contact_seeder` ON `contact_seeder`.`contact_id` = 
    `contact`.`id`
    WHERE
        `contact`.`number` = 'mobile number'
        AND `contact`.`full_name` = 'name'
        AND `contact`.`country_id` = 12
    

    seeder_id will be null if there is no record for this contact. So you can make later bulk inserts

    The second method is INSERT IGNORE. You will need unique index on one or more columns. In case the row exists it will be ignored, else inserted. Again you can make bulk inserts here.

    Bonus hint: its a good approach to use UUID in such cases. It allow you to handle better the need of ID before it is inserted.

    Edit: A little more extended example to answer the question in comment.

    <?php 
    $query = "SELECT
            `contact`.`id`,
            `contact_seeder`.`id` as `seeder_id`
        FROM
            `contact`
            LEFT JOIN `contact_seeder` ON `contact_seeder`.`contact_id` = `contact`.`id`
        WHERE ";
    //lets say this is your request data
    $data = [
        ['number' => '654654', 'name' => 'full_name'],
        ['number' => '3212123', 'name' => 'full_name2']
    ];
    $additionalQueries = []
    foreach($data as $item){
        $additionalQueries[] = sprintf('(`contact`.`number` = '%s' AND `contact`.`full_name` = '%s')', $item['number'], $item['name']);
    }
    
    $query.=sprintf('(%s) AND `contact`.`country_id` = 12', implode(' OR ', $additionalQueries));
    

    Now your query should look like this:

    SELECT
            `contact`.`id`,
            `contact_seeder`.`id` as `seeder_id`
        FROM
            `contact`
            LEFT JOIN `contact_seeder` ON `contact_seeder`.`contact_id` = `contact`.`id`
        WHERE ((`contact`.`number` = '654654' AND `contact`.`full_name` = 'full_name') OR (`contact`.`number` = '3212123' AND `contact`.`full_name` = 'full_name2')) AND `contact`.`country_id` = 12
    

    And you have all contacts with single query and know which one seeder_id is null so you can do your insert.

    Regarding UUID: the idea of UUID is not to insert it and then select from database. The idea is to have it generated before the insert so you can link your relations before they are inserted. Just ignore that for now

    Login or Signup to reply.
  2. You can try INSERT IGNORE INTO this will automatically remove your check, You can keep your number (mobile) as unique / primary key, which you are already doing.

    This will remove your first query.

    Your explanation is far beyond the understanding. Please be elobrative and also make it understandable.

    If it returns that seeder already exists then I don’t insert this
    seeder otherwise I insert the record in contact seeder table.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search