skip to Main Content

i am trying to evolve my mongo search engine into something more advanced. I know why the below code is giving me the results. I struggle with how to expand the capabilities of the search.

I would like that result #5 is on top because it found both search parts in multiple or single field.

My current thought is to do the search query for each search part then merge the results and add an incremental for each occurence and order by that number. I rather not do multiple executions at the same time but atleast you understand where my mind is on trying to solve this.

Simply changing $builder->addOr(...$searchQueries); to $builder->addAnd(...$searchQueries); will result zero results because all fields will match.

    $search = 'small table';
    $fields = ['name', 'small_description', 'long_description'];

    foreach ($fields as $field) {

        $searchValues = [];
        foreach (explode(' ', $search) as $searchPart) {
            $searchValues[] = new Regex(preg_quote($searchPart), 'i');
        }

        $searchQueries[] = $builder->expr()->field($field)->in($searchValues);
    }

    $builder->addOr(...$searchQueries);

    /**
     * result 5 products
     *
     * 1. comfortable ( because it matches table with the regex )
     * 2. small ( because one part of the search matches )
     * 3. table ( because one part of the search matches )
     * 4. tablespoon ( because it matches table with the regex )
     * 5. small table ( matches but is below in the order while this should be on top )
     */

3

Answers


  1. There’s only so much you can do with regexes, it’s better to reach for better alternatives:

    Login or Signup to reply.
  2. Relevance of search results is a can of worms, or a business opportunity, depending on how you look at it, but it’s tough regardless of the angle of view.

    It may look obvious for the synthetic example in the question, but just counting matches is pretty naive approach, which may result with very unexpected order. Anyway, since the question is about "how to implement it", let’s focus on implementation of this strategy.

    First, we need to add a temporary field "_weight" to each document, populate it with a numeric value that represents relevance of the document, then sort them by that field. Applying this sequence in a single request means you need to use "aggregation" query instead of "find".

    With few more improvements, like combining fields to search into a single string, combining search terms in a single regexp, the pipeline can look like this in mongodb syntax:

    [
      {
        "$addFields": {
          "_weight": {
            $size: {
              "$regexFindAll": {
                "input": {
                  "$concat": [
                    "$name",
                    " ",
                    "$small_description",
                    " ",
                    "$long_description"
                  ]
                },
                "regex": {
                  "$regex": "small|table",
                  "$options": "i"
                }
              }
            }
          }
        }
      },
      {
        "$sort": {
          "_weight": -1
        }
      },
      {
        "$project": {
          "_weight": 0
        }
      }
    

    Playground example

    Translation to Doctrine syntax may look like this:

    $builder = $dm->createAggregationBuilder('YourDocumentClass');
    
    $search = 'small table';
    $fields = ['name', 'small_description', 'long_description'];
    
    $builder
      ->addFields()
        ->field('_weight')
        ->expression(
          $builder->expr()
            ->size(
              $builder->expr()
                ->regexFindAll(
                  $builder->expr()
                    ->concat(...preg_filter('/^/', '$', $fields)),
                    [
                      'regex' => implode('|', array_map('preg_quote', explode(' ', $search))),
                      'options' => 'i'
                    ]
                )
            )
        )
      ->sort('_weight', -1)
      ->project()
          ->exclude('_weight');
    
    Login or Signup to reply.
  3. Actually the current approach treats all the matches equally and cannot provide a way to score documents based on how many of the search terms they contain. You need to improve this strategy so if possible try using a Text Index on multiple fields. MongoDB text search will automatically score results and let you sort by a relevance score.

    
    $collection->createIndex(
        [ 'name' => 'text', 'small_description' => 'text', 'long_description' => 'text' ],
        [ 'weights' => [ 'name' => 10, 'small_description' => 5, 'long_description' => 1 ] ]
    );
    
    $results = $collection->find(
        [ '$text' => [ '$search' => "small table" ] ],
        [ 'score' => [ '$meta' => "textScore" ] ]
    )->sort([ 'score' => ['$meta' => "textScore"] ]);
    
    

    This gives you built-in scoring and ranking.

    Also, if you don’t want to use MongoDB text search consider using aggregation pipeline to compute a custom score.

    For example:

    First, $match documents that contain at least one of the search terms in any field (similar to what you do now). Then, use $addFields to compute a score that counts how many times your search terms appear across all fields.

    $search = 'small table';
    $searchParts = explode(' ', $search);
    $fields = ['name', 'small_description', 'long_description'];
    
    // Build an $or condition to find docs that contain any part of the search in any field
    $orConditions = [];
    foreach ($fields as $field) {
        foreach ($searchParts as $part) {
            $orConditions[] = [
                $field => ['$regex' => new MongoDBBSONRegex(preg_quote($part), 'i')]
            ];
        }
    }
    
    $pipeline = [
        ['$match' => ['$or' => $orConditions]],
        ['$addFields' => [
            'score' => [
                '$sum' => array_map(function($part) use ($fields) {
                    return [
                        '$cond' => [
                            ['$or' => array_map(function($field) use ($part) {
                                return [
                                    '$regexMatch' => [
                                        'input' => '$' . $field,
                                        'regex' => new MongoDBBSONRegex(preg_quote($part), 'i')
                                    ]
                                ];
                            }, $fields)],
                            1,
                            0
                        ]
                    ];
                }, $searchParts)
            ]
        ]],
        ['$sort' => ['score' => -1]]
    ];
    
    $results = $collection->aggregate($pipeline);
    
    

    So, MongoDB text search is often the quickest path to a well-ranked result set, while the aggregation pipeline approach offers more flexibility if text indexing is not a fit.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search