Background
A raw SQL query with many joins and selects and whatnots, when loaded with
$results = DB::SELECT($query)
creates a ~500mb, 450,000 items $results
collection, which is not good.
Solution
So the $query now instead creates a temporary table, which gets read with this:
$new_rows = true;
$chunk = 25000;
$offset = 0;
while ($new_rows) {
$chunked_results = DB::SELECT('SELECT * FROM tmp_table ORDER BY id DESC LIMIT ' . $chunk . ' OFFSET ' . $offset );
if (count($chunked_results) === 0) {
$new_rows = false;
}
foreach ($chunked_results as $row) {
//Do stuff with $row
}
$offset+= $chunk;
unset($chunked_results);
}
Problem
Despite unsetting the $chunked_results
after each query, my script still runs out of memory around the 75,000th row – pretty much the same no. of rows that got loaded with no chunking/tmp_table.
This makes me believe that somehow, the contents of previous $chunked_results
are kept in memory and do not really get unset, or the DB::class
remembers things in the background.
Doing DB::RECONNECT('mysql')
annihilates the temporary table 🙁
2
Answers
The columns in
SELECT *
will explode by a factor of about 40 when they are brought into PHP. If you don’t nee all the columns, provide an explicit list.OFFSET
becomes slower and slower as it gets bigger. This is because the query must skip over more and more rows.Instead, "Remember where you left off with the aid of
WHERE
." Discussion: https://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks In your case,WHERE id > $the_last_id_of_prev_query
sounds appropriate.I don’t recommend chunks bigger 1000 rows because you may run into other limits. (And >1000 is getting into "diminishing returns".)
Perhaps "//Do stuff with $row" is the memory problem.
I’m imagining that in the foreach loop, you are persisting data outside the code logic you have.
Can you update the question to contain the logic in comment:
Because if you save data outside of that statement, it remains in memory. So while you have an optimization in regards to chunking the
The memory leak remains since data is likely aggregated outside of the relevant code you provided.