skip to Main Content

I’m trying to find all unique strings of some subdocument in MongoDB, as long as at least one of them contains a certain substring.
These strings are localized though.
I want to have the results for all possible languages.

So, my document would look something like this:

{
  ...
  "myLocalizedField": {
    "en": ...,
    "de": ...,
    "cz": ...
...
}

I do not know which languages are being used in the DB and not all documents use the same languages.

This is what my current pipeline looks like:

[
  {
    $match:
      {
        "I DON'T KNOW": {
          $regex: "my substring",
        },
      },
  },
  {
    $project:
      {
        myLocalizedField: 1,
      },
  },
  {
    $unwind:
      {
        path: "$myLocalizedField",
      },
  },
  {
    $group:
      {
        _id: {
          key: "I DON'T KNOW",
        },
      },
  },
]

I am unsure what to put in the $match and in the $group stages…

2

Answers


  1. $objectToArray to rescue:

    [
      {
        $project: { localized: { $objectToArray: "$myLocalizedField" } }
      },
      {
        $unwind:
          {
            path: "$localized",
          },
      },
      {
        $match:
          {
            "localized.v": {
              $regex: "my substring",
            },
          },
      },
      {
        $group:
          {
            _id: {
              key: "$localized.k",
            },
          },
      },
    ]
    
    Login or Signup to reply.
  2. You can use the $objectToArray operator:

    Pseudocode

    INPUT:
    {"en":"foo", "de":"bar", "cz":"buz"} 
    OUTPUT:
    [{"k":"en", "v":"foo"}, {"k":"de", "v":"bar"}, {"k":"cz", "v":"buz"}]
    

    db.collection.aggregate([
      {
        "$addFields": {
          "tmp": {
            "$objectToArray": "$myLocalizedField"
          }
        }
      },
      {
        $match: {
          "tmp.v": {
            $regex: "llo"
          }
        }
      },
      {
        $unwind: {
          path: "$tmp"
        }
      },
      {
        $match: {
          "tmp.v": {
            $regex: "llo"
          }
        }
      }
    ])
    

    MongoPlayground

    Note: This solution doesn’t scale, since we reshape the document before searching.

    I recommend you to store your documents this way:

    {
      "myLocalizedField": [
        {"k": "en", "v": "Hello world"},
        {"k": "de", "v": "Hallo Welt"},
        {"k": "cz", "v": "Ahoj světe"},
        {"k": "es", "v": "Hola Mundo"},
      ]
    }
    

    In this way the search is scalable and it is easier to apply filters.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search