skip to Main Content

Let assume we have 2 strings ($strA and $strB) of some MBs lenght and we want to find (if exists) any 8-bytes substring of $strA inside $strB. My current approach works however is unacceptable slow:

function FindAnySubstring($strA,$strB)
{
    for( $i=0; $i<strlen($strA)-8; $i++ )
    {
        $toFind=subStr($strA,$i,8);
        $pos=strpos($strB,$toFind);
        if( $pos === FALSE )
            continue;
        return [$i,$pos];
    }
    return FALSE;
}

Can it be done in a better way?

EDIT:
Sample data – well, probably I’m not able to add big files, but let assume that we can use shorter samples just for visualisation:

$strA=’abcdefghijKLMNOPQRstuvxyz’;
$strB=’ldsf32vKLMNOPQR4fa156232543′;

Expected result [11,8] (at 11th pos in $strA and 8th pos in $strB)

Current execution time for 3,321,792 bytes long $strA and $strB it takes between 0 and 2530 seconds to find a solution (!) – depending on position of the substring. I would expect 2 orders faster algorithm.

Considered ideas – I have some ideas:

  • find shorter substring (substring of the substring, starting from 1 byte) and if found – check the rest; tested with no improvements
  • use pararelling; always work, however harder to code

2

Answers


  1. There are a few improvements you can make to potentially optimize its performance:

    Early Exit: You can exit the loop as soon as a match is found, rather than continuing to search for additional matches. This can significantly reduce the number of unnecessary iterations.

    Precompute Lengths: Instead of repeatedly calling strlen($strA) inside the loop, you can precompute the length of $strA before the loop starts to improve efficiency.

    Here’s an updated version of your function with these improvements:

    function FindAnySubstring($strA, $strB)
    {
       $lenA = strlen($strA);
       $lenB = strlen($strB);
    
       for ($i = 0; $i <= $lenA - 8; $i++)
       {
          $toFind = substr($strA, $i, 8);
          $pos = strpos($strB, $toFind);
    
          if ($pos !== false)
          {
              return [$i, $pos];
          }
       }
    
      return false;
    }
    
    Login or Signup to reply.
  2. I wrote this , Also you can change it to find all substrings :

    function FindAnySubstring($strA,$strB) {
        $a2 = str_split($strB);
        foreach (str_split($strA) as $first => $firstw) :
            foreach ($a2 as $second => $secw) :
                if ($first === $secw) :
                    return [$firstm+1,$second+1];
                endif;
            endforeach;
        endforeach;
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search