skip to Main Content

Background

A raw SQL query with many joins and selects and whatnots, when loaded with

$results = DB::SELECT($query)

creates a ~500mb, 450,000 items $results collection, which is not good.

Solution

So the $query now instead creates a temporary table, which gets read with this:

$new_rows = true;
$chunk = 25000;
$offset = 0;

while ($new_rows) {
    $chunked_results = DB::SELECT('SELECT * FROM tmp_table ORDER BY id DESC LIMIT ' . $chunk . ' OFFSET ' . $offset );
    
    if (count($chunked_results) === 0) {
        $new_rows = false;
    }
    
    foreach ($chunked_results as $row) {
        //Do stuff with $row
    }

    $offset+= $chunk;
    unset($chunked_results);
}

Problem

Despite unsetting the $chunked_results after each query, my script still runs out of memory around the 75,000th row – pretty much the same no. of rows that got loaded with no chunking/tmp_table.

This makes me believe that somehow, the contents of previous $chunked_results are kept in memory and do not really get unset, or the DB::class remembers things in the background.

Doing DB::RECONNECT('mysql') annihilates the temporary table 🙁

2

Answers


  1. The columns in SELECT * will explode by a factor of about 40 when they are brought into PHP. If you don’t nee all the columns, provide an explicit list.

    OFFSET becomes slower and slower as it gets bigger. This is because the query must skip over more and more rows.

    Instead, "Remember where you left off with the aid of WHERE." Discussion: https://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks In your case, WHERE id > $the_last_id_of_prev_query sounds appropriate.

    I don’t recommend chunks bigger 1000 rows because you may run into other limits. (And >1000 is getting into "diminishing returns".)

    Perhaps "//Do stuff with $row" is the memory problem.

    Login or Signup to reply.
  2. I’m imagining that in the foreach loop, you are persisting data outside the code logic you have.

    Can you update the question to contain the logic in comment:

    //Do stuff with $row
    

    Because if you save data outside of that statement, it remains in memory. So while you have an optimization in regards to chunking the

    DB::SELECT(...)
    

    The memory leak remains since data is likely aggregated outside of the relevant code you provided.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search