EDIT: Thanks to nice_dev for providing an excellent solution below!
I had to rewrite part of his query for correct results:
From:
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray, &$matched) {
$matched[ $val ] = $matched [$val] ?? 0;
$matched[ $val ]++;
return $matched[ $val ] <= $tmparray[ $val ];
}
);
To:
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray) {
return empty($tmparray[$val]) || !$tmparray[$val]++;
}
);
Original:
I’m attempting to compare arrays to get a new array of matched and unmatched items. I have a loop that goes through roughly 55,000 items. The processing of this script can take upwards of 20+ minutes to attempt to complete and I’ve narrowed it down to both usage of array_intersect
and array_filter
within the foreach. Ideally, I need it to complete much faster. If I limit the foreach to 1000 items it still takes upwards of ~3 minutes to complete which is slow for the client-side experience.
If I remove them, the script completes almost immediately. As such, I will include only these relevant pieces in the code below.
I’m using a custom array_intersect_fixed
function as regular array_intersect
returned wrong results with duplicate values as per here.
Explanations:
totalidsarray
= An array of numbers such as [‘11233’, ‘35353, ‘3432433’, ‘123323’]. Could contain thousands of items.
$deck['main_deck']
= An array of numbers to compare against $totalidsarray
. Similar structure. Max length is 60 items.
foreach($dbdeckarray as $deck){
$tmparray = $totalidsarray;
//Get an array of unmatched cards between the deck and the collection
//Collection = $tmparray
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray) {
$key = array_search($val, $tmparray);
if ( $key === false ) return true;
unset($tmparray[$key]);
return false;
}
);
//Get an array of matched cards between the deck and the collection
$tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $totalidsarray);
//Push results to matcharray
$matcharray[] = $tmpmatch;
}
//Built in array_intersect function returns wrong result when input arrays have duplicate values.
function array_intersect_fixed($array1, $array2) {
$result = array();
foreach ($array1 as $val) {
if (($key = array_search($val, $array2, FALSE))!==false) {
$result[] = $val;
unset($array2[$key]);
}
}
return $result;
}
To make matters worse, I have to do 2 further matched/unmatched checks within that same foreach loop against another array extra_deck
, further increasing processing time.
Is there a more optimized approach I can take to this?
EDIT: Explanation of what the code needs to achieve.
The script will retrieve the user’s card collection of cards that they own from a card game. This is assigned into totalidsarray
. It will then query every deck in the database (~55,000) and compare the collection you own against the built deck of cards (main_deck
). It then attempted to extract all owned cards (matched
) and all un-owned cards (unmatched
) into two arrays. Once the full foreach loop is done, the client-side returns a list of each deck alongside the matched cards/unmatched cards of each (with a % match for each).
2
Answers
A couple of optimizations I can suggest:
The
array_intersect_fixed
routine you have is quadratic in nature in terms of getting the result, because it is 2 nested loops under the hood. We can usearray_count_values
to optimize it to work in linear time(which uses a map).json_decode()
doesn’t need to be done twice for every deck. If you do it once and use it wherever needed, it should work just fine(unless you make any edits in place which I don’t find right now) . It also needs to be decoded to an array and not to an object using thetrue
flag.For your
array_filter
, the comparison is also quadratic in nature. We will usearray_count_values
again to optimize it and use a$matched
array. We keep counting the frequency of elements and if any of them surpasses count in$tmparray
, we returnfalse
, else, we returntrue
.Snippet:
Note:
array_intersect_fixed
expects$array2
to be in the Hashmap way by default. If you wish to use it elsewhere, make sure to passarray_count_values
of the array as 2nd parameter or use a third parameter to indicate a flag check otherwise.Beside @nice_dev suggestion your code can be simplified.
The unmatched part is an array diff
The array_intersect_fixed(), if the problem are duplicated value in array, can be avoided by running array_unique() on the array (I guess is $deck[‘main_deck’]) before calling array_intersect()
This will also speed up array_diff() as it will have less array element to compare.