skip to Main Content

How does Elasticsearch manage escaped characters in query?

I have a document where sku field is set as keyword.

{
    "_index": "magento2_product_1_v45",
    "_type": "document",
    "_source": {     
        "sku": "414-123
    }
}

The search API below gives the document result: sku:414-123

http://localhost:9200/magento2_product_1_v45/_search?q=sku:414-123
{
"hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
        {
            "_index": "magento2_product_1_v45",
            "_type": "document",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
                "store_id": "1",
                "sku": "414-123"
            }
        }
    ]
}

}

But the following does not retrieve the document: sku:414-123

http://localhost:9200/magento2_product_1_v45/_search?q=sku:414\-123
{
"hits": {
    "total": 0,
    "max_score": null,
    "hits": []
}

}

Magento is be default escaping the dash. Elasticsearch seems to be treating the escape character as part of the query string.

3

Answers


  1. According to this

    You need to append special character in query string with ”. In your case the your application is sending this two times which i am afraid treats or expects second ” as part of string.

    You need to modify your query as follows.

    http://localhost:9200/magento2_product_1_v45/_search?q=sku:414-123

    Login or Signup to reply.
  2. When performing queries in Elasticsearch, the analyzer that is being applied under the hood should be taken into account (by default Standard analyzer). If you specify the value “414-123” in the query term, Elasticsearch will split this value in two different tokens “414” and “123” and will retrieve any document having present one of these 2 terms applying an OR operation. It is also important to check the type of the “sku” field, there is a big difference between the types “text” and “keyword”. As you need to search for “414-123” (all together) the type of this field must be keyword, otherwise Elasticsearch will tokenize it.

    Login or Signup to reply.
  3. I’ve had this problem before with literal searches where dashes need to be considered, and switched the query type to match_phrase and set the field to [field].keyword, example;

    $params           = [];
    $params['index']  = 'the_index';
    $params['type']   = 'product';
    $field            = 'sku.keyword';
    $selected         = '414-123';
    
    $params['body']['query']['bool']['must'][] = ['match_phrase' => [$field => "$selected"]];
    
    $client = $this->clientBuilder;
    $response = $client->search($params);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search