skip to Main Content

EDIT: Thanks to nice_dev for providing an excellent solution below!

I had to rewrite part of his query for correct results:

From:

$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray, &$matched) {
              $matched[ $val ] = $matched [$val] ?? 0;
              $matched[ $val ]++;
              return $matched[ $val ] <= $tmparray[ $val ];
            }
        );   

To:

$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
 function ($val) use (&$tmparray) {
  return empty($tmparray[$val]) || !$tmparray[$val]++;
 }
);

Original:

I’m attempting to compare arrays to get a new array of matched and unmatched items. I have a loop that goes through roughly 55,000 items. The processing of this script can take upwards of 20+ minutes to attempt to complete and I’ve narrowed it down to both usage of array_intersect and array_filter within the foreach. Ideally, I need it to complete much faster. If I limit the foreach to 1000 items it still takes upwards of ~3 minutes to complete which is slow for the client-side experience.

If I remove them, the script completes almost immediately. As such, I will include only these relevant pieces in the code below.

I’m using a custom array_intersect_fixed function as regular array_intersect returned wrong results with duplicate values as per here.

Explanations:

totalidsarray = An array of numbers such as [‘11233’, ‘35353, ‘3432433’, ‘123323’]. Could contain thousands of items.

$deck['main_deck'] = An array of numbers to compare against $totalidsarray. Similar structure. Max length is 60 items.

foreach($dbdeckarray as $deck){

         $tmparray = $totalidsarray;

        //Get an array of unmatched cards between the deck and the collection
        //Collection = $tmparray
        $tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray) {
                $key = array_search($val, $tmparray);
                if ( $key === false ) return true;
                unset($tmparray[$key]);
                return false;
            }
        );   

        //Get an array of matched cards between the deck and the collection
        $tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $totalidsarray);

        //Push results to matcharray
        $matcharray[] = $tmpmatch;

}

//Built in array_intersect function returns wrong result when input arrays have duplicate values.
function array_intersect_fixed($array1, $array2) {
    $result = array();
    foreach ($array1 as $val) {
        if (($key = array_search($val, $array2, FALSE))!==false) {
            $result[] = $val;
            unset($array2[$key]);
        }
    }
    return $result;
}

To make matters worse, I have to do 2 further matched/unmatched checks within that same foreach loop against another array extra_deck, further increasing processing time.

Is there a more optimized approach I can take to this?

EDIT: Explanation of what the code needs to achieve.

The script will retrieve the user’s card collection of cards that they own from a card game. This is assigned into totalidsarray. It will then query every deck in the database (~55,000) and compare the collection you own against the built deck of cards (main_deck). It then attempted to extract all owned cards (matched) and all un-owned cards (unmatched) into two arrays. Once the full foreach loop is done, the client-side returns a list of each deck alongside the matched cards/unmatched cards of each (with a % match for each).

2

Answers


  1. A couple of optimizations I can suggest:

    • The array_intersect_fixed routine you have is quadratic in nature in terms of getting the result, because it is 2 nested loops under the hood. We can use array_count_values to optimize it to work in linear time(which uses a map).

    • json_decode() doesn’t need to be done twice for every deck. If you do it once and use it wherever needed, it should work just fine(unless you make any edits in place which I don’t find right now) . It also needs to be decoded to an array and not to an object using the true flag.

    • For your array_filter, the comparison is also quadratic in nature. We will use array_count_values again to optimize it and use a $matched array. We keep counting the frequency of elements and if any of them surpasses count in $tmparray, we return false, else, we return true.

    Snippet:

    <?php 
    
    $tmparray = array_count_values($totalidsarray);
    
    foreach($dbdeckarray as $deck){
            $matched = [];        
            $deck['main_deck'] = json_decode($deck['main_deck'], true);
            $tmpmatch['unmatched'] = array_filter($deck['main_deck'],
                function ($val) use (&$tmparray, &$matched) {
                  $matched[ $val ] = $matched [$val] ?? 0;
                  $matched[ $val ]++;
                  return $matched[ $val ] <= $tmparray[ $val ];
                }
            );   
    
            $tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $tmparray);
            $matcharray[] = $tmpmatch;
    }
    
    function array_intersect_fixed($array1, $array2) {
        $result = array();
        $matched = [];
        foreach ($array1 as $val) {
            $matched[ $val ] = $matched[ $val ] ?? 0;
            $matched[ $val ]++;
            if (isset($array2[ $val ]) && $matched[ $val ] <= $array2[ $val ]) {
                $result[] = $val;
            }
        }
        return $result;
    }
    

    Note: array_intersect_fixed expects $array2 to be in the Hashmap way by default. If you wish to use it elsewhere, make sure to pass array_count_values of the array as 2nd parameter or use a third parameter to indicate a flag check otherwise.

    Login or Signup to reply.
  2. Beside @nice_dev suggestion your code can be simplified.

    The unmatched part is an array diff

    array_diff($deck['main_deck'], $tmparray);
    

    The array_intersect_fixed(), if the problem are duplicated value in array, can be avoided by running array_unique() on the array (I guess is $deck[‘main_deck’]) before calling array_intersect()

    This will also speed up array_diff() as it will have less array element to compare.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search