skip to Main Content

I have performance issue on a query, potentiall wrongly written.

Dear all.
I have two tables.
One table "test_cdr" , with rougly 5M rows, contains three columns

  • POINT_TARGET | Contains called phone numbers
  • dest | should contain a destination identifier
  • DURATION | the duration of the call

Another table "codes" contains two columns

  • area_codes | contains a list of country prefixes
  • dest | contains the destination identifier

Now I need to puplate the test_cdr.dest by:

  • run every number in test_cdr.POINT_TARGET against the codes.area_codes
  • identify the best matches codes.area_codes
  • take the data inside codes.dest and copy it into test_cdr

like this, I can then sum the duration of each call per destination type.

I am, atm, using this query

UPDATE test_cdr AS main
JOIN (
    SELECT c.POINT_TARGET, t.dest
    FROM test_cdr c
    JOIN codes t ON c.POINT_TARGET LIKE CONCAT(t.area_codes, '%')
    WHERE c.dest IS NULL  -- Only update rows where dest is NULL
    LIMIT 1000  -- Adjust the batch size as needed
) AS matched ON main.POINT_TARGET = matched.POINT_TARGET
SET main.dest = matched.dest;

It only works with LIMIT and it returns roughly 10000 rows (every 20 seconds – pretty lame performance)
As soon as I remove LIMIT, the query runs but nothing is performed on the db (I see no disk activity, while with LIMIT there is disk activity).

Could you help me figure out the issue or maybe provide a better option?
Btw, I have used INDEXES.

Thank you.

2

Answers


  1. Chosen as BEST ANSWER

    after suggestion from Konteye I modified the query:

    DELIMITER $$
    
    CREATE PROCEDURE update_when_dest_is_null()
    BEGIN
        DECLARE done INT DEFAULT FALSE;
        DECLARE batch_size INT DEFAULT 1000;
        DECLARE offset INT DEFAULT 0;
        DECLARE total_rows INT;
    
        -- Get the total number of rows where dest is NULL
        SELECT COUNT(*) INTO total_rows FROM test_cdr WHERE dest IS NULL;
    
        -- Start the loop to process updates in batches
        WHILE NOT done DO
            -- Start a transaction
            START TRANSACTION;
    
            -- Update the test_cdr table in batches
            UPDATE test_cdr AS main
            JOIN (
                SELECT c.POINT_TARGET, t.dest
                FROM test_cdr c
                JOIN codes t ON c.POINT_TARGET LIKE CONCAT(t.area_codes, '%')
                WHERE c.dest IS NULL
                LIMIT batch_size OFFSET offset
            ) AS matched ON main.POINT_TARGET = matched.POINT_TARGET
            SET main.dest = matched.dest;
    
            -- Commit the transaction
            COMMIT;
    
            -- Update the offset for the next batch
            SET offset = offset + batch_size;
    
            -- Check if we've processed all rows
            IF offset >= total_rows THEN
                SET done = TRUE;
            END IF;
        END WHILE;
    END$$
    
    DELIMITER ;
    

    It is still very slow, but processing.


  2. I recommend creating a procedure to paginate through the table. Here’s an example:

    CREATE PROCEDURE update_when_dest_is_null()
    BEGIN
        DECLARE done INT DEFAULT FALSE;
        DECLARE batch_size INT DEFAULT 1000;
        DECLARE offset INT DEFAULT 0;
        DECLARE total_rows INT;
    
        SELECT COUNT(*) INTO total_rows FROM test_cdr WHERE dest IS NULL;
    
        WHILE NOT done DO
            START TRANSACTION;
    
            UPDATE test_cdr AS main
            JOIN (
                SELECT c.POINT_TARGET, t.dest
                FROM test_cdr c
                JOIN codes t ON c.POINT_TARGET LIKE CONCAT(t.area_codes, '%')
                WHERE c.dest IS NULL
                LIMIT batch_size OFFSET offset
            ) AS matched ON main.POINT_TARGET = matched.POINT_TARGET
            SET main.dest = matched.dest;
    
            COMMIT;
    
            SET offset = offset + batch_size;
    
            IF offset >= total_rows THEN
                SET done = TRUE;
            END IF;
        END WHILE;
    END
    

    Be sure to test this procedure thoroughly before using it in a production environment.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search