skip to Main Content

I have a list of many strings that have similarities, example :

$str = array('monkey eat a banana',
             'dog eat a banana',
             'cat devour an apple',
             'cat dine a coco'); //etc

I would like to extract X strings from this array that are the most different each other.
Example : if i want to extract 3 one, it will be : ‘monkey eat a banana’ and ‘cat dine a coco’ and ‘cat devour an apple’.

How can i implement this ? I have found the similar_text() function, i think i may use it but how to extract them with any value of X ?

thanks for your advices

ps : i use this for SEO, the goal is to avoid the most possible duplicate content.

2

Answers


  1. Tested with the following example code, the conclusion is that: choose the strings with lowest percentage from similar_text(), they are the most different ones.

    $str = array('monkey eat a banana',
             'dog eat a banana',
             'cat devour an apple',
             'cat dine a coco');
    
    $len = count($str);
    echo '<table width="100%">';
    for($i=0; $i<$len; $i++) {
      for($j=0; $j<$len; $j++) {
        if($i==$j) contiue; 
        $num = similar_text($str[$i], $str[$j], $percent );
        echo '<tr><td>' . $str[$i] . '<td>' . $str[$j] . '<td>' . strlen($str[$i]) . '<td>' . strlen($str[$j]). '<td>' . $num. '<td>' . number_format($percent, 0);
      }
    }
    echo '</table>';
    

    The results is as follows:

    string 1             string 2                           percentage
    monkey eat a banana  monkey eat a banana    19  19  19  100
    monkey eat a banana  dog eat a banana       19  16  14  80
    monkey eat a banana  cat devour an apple    19  19  7   37
    monkey eat a banana  cat dine a coco        19  15  5   29
    dog eat a banana     monkey eat a banana    16  19  14  80
    dog eat a banana     dog eat a banana       16  16  16  100
    dog eat a banana     cat devour an apple    16  19  7   40
    dog eat a banana     cat dine a coco        16  15  5   32
    cat devour an apple  monkey eat a banana    19  19  7   37
    cat devour an apple  dog eat a banana       19  16  7   40
    cat devour an apple  cat devour an apple    19  19  19  100
    cat devour an apple  cat dine a coco        19  15  9   53
    cat dine a coco      monkey eat a banana    15  19  5   29
    cat dine a coco      dog eat a banana       15  16  5   32
    cat dine a coco      cat devour an apple    15  19  9   53
    cat dine a coco      cat dine a coco        15  15  15  100
    
    Login or Signup to reply.
  2. $Hope to be helpful

    $str = array(
        'cat devour an apple',
        'dog eat a banana',
        'monkey eat a banana',
        'cat dine a coco',
    ); //etc
    
    $overal_scores = [];
    foreach ($str as $i => $s) {
        $overal_scores[$i] = 0;
        foreach ($str as $j => $d) {
            if ($i != $j) {
                $overal_scores[$i] += similar_text($s, $d);
            }
        }
    }
    asort($overal_scores);
    $x = 3;
    $results_index = array_slice(array_keys($overal_scores), 0, $x);
    $result_str = [];
    foreach ($results_index as $index) {
        $result_str[] = $str[$index];
    }
    var_dump($result_str);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search