skip to Main Content

let’s suppose there are 1 million documents in a collection, each document need to have some tags corresponding to it,
should i be storing these tags in the form or array or in the form of object having boolean values?

first way:

{
 "key" : "key_1",
 "tags" : ["a","b","c","d","e"]
}
second way:

{
 "key" : "key_1",
 "tags" : {
    "a" : true,
    "b" : true,
    "f" : true
 }
}

List of total tags possible be 200, but each document can have atmost 10 tag at a time. which should be the preferable approach to store tags and why?

Want to query like :- fetch all the documents having tags "a", "b" and "d"

Storing tags as array can be more descriptive as compared to having an object

2

Answers


  1. I’d opt for storing the tags as an array instead of an object. You can create a multikey index that supports the query for documents having one or more tags. This is very important and would be nearly impossible if storing the tags as an object.

    Storing the tags as an object would only make sense if the value contains some information. In your sample, all values are true, so this should not be the case either.

    In addition, there are not too many restrictions on property names in MongoDB, but there are some. The array stores the tag names as data and does not need to restrict the names so that they are valid MongoDB property names.

    Login or Signup to reply.
  2. From an indexing perspective, both ways can be supported by indexes.

    In the first way, a single key index would suffice.

    {"tags": 1}
    

    Mongo Playground

    In the second way, a wildcard key is needed.

    {"tags.$**": 1}
    

    Mongo Playground

    Personally, I would opt for the first way – storing them in an array. This would improve the maintainability and future extensibility. Currently, you are assuming 200 possible values of tags. However, it might grow from time to time and you might need to flip the tags object a lot if you change the tag frequently. That might create a bloat object with a lot of false values but a few true values. Also, if you need to perform some analytics purpose on the collection(i.e. involves $lookup), storing the tags in a simple array makes it easier to join. Also, having dynamic values as field names is considered an anti-pattern and would cause unnecessary complexity to the query.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search