i am trying to evolve my mongo search engine into something more advanced. I know why the below code is giving me the results. I struggle with how to expand the capabilities of the search.
I would like that result #5 is on top because it found both search parts in multiple or single field.
My current thought is to do the search query for each search part then merge the results and add an incremental for each occurence and order by that number. I rather not do multiple executions at the same time but atleast you understand where my mind is on trying to solve this.
Simply changing $builder->addOr(...$searchQueries);
to $builder->addAnd(...$searchQueries);
will result zero results because all fields will match.
$search = 'small table';
$fields = ['name', 'small_description', 'long_description'];
foreach ($fields as $field) {
$searchValues = [];
foreach (explode(' ', $search) as $searchPart) {
$searchValues[] = new Regex(preg_quote($searchPart), 'i');
}
$searchQueries[] = $builder->expr()->field($field)->in($searchValues);
}
$builder->addOr(...$searchQueries);
/**
* result 5 products
*
* 1. comfortable ( because it matches table with the regex )
* 2. small ( because one part of the search matches )
* 3. table ( because one part of the search matches )
* 4. tablespoon ( because it matches table with the regex )
* 5. small table ( matches but is below in the order while this should be on top )
*/
3
Answers
There’s only so much you can do with regexes, it’s better to reach for better alternatives:
Relevance of search results is a can of worms, or a business opportunity, depending on how you look at it, but it’s tough regardless of the angle of view.
It may look obvious for the synthetic example in the question, but just counting matches is pretty naive approach, which may result with very unexpected order. Anyway, since the question is about "how to implement it", let’s focus on implementation of this strategy.
First, we need to add a temporary field "_weight" to each document, populate it with a numeric value that represents relevance of the document, then sort them by that field. Applying this sequence in a single request means you need to use "aggregation" query instead of "find".
With few more improvements, like combining fields to search into a single string, combining search terms in a single regexp, the pipeline can look like this in mongodb syntax:
Playground example
Translation to Doctrine syntax may look like this:
Actually the current approach treats all the matches equally and cannot provide a way to score documents based on how many of the search terms they contain. You need to improve this strategy so if possible try using a Text Index on multiple fields. MongoDB text search will automatically score results and let you sort by a relevance score.
This gives you built-in scoring and ranking.
Also, if you don’t want to use MongoDB text search consider using aggregation pipeline to compute a custom score.
For example:
First, $match documents that contain at least one of the search terms in any field (similar to what you do now). Then, use $addFields to compute a score that counts how many times your search terms appear across all fields.
So, MongoDB text search is often the quickest path to a well-ranked result set, while the aggregation pipeline approach offers more flexibility if text indexing is not a fit.