skip to Main Content

After reading many similar questions, although new to MongoDB and pymongo, I believe this specific use case is not covered fully.

I need to perform a case insensitive search on a MongoDB collection which holds place names as strings with the following structure:

{
  "_id": {
    "$oid": "6419bc2aa7cb0cf0502af787"
  },
  "placename": "Bologna",
  "geonamesid": 3181928,
  "timestamp": {
    "$date": "2023-03-21T14:16:10.976Z"
  }
}

as such the following pymongo query works well:

placename = "Bologna"
query = {"placename": placename}
res = mdbcoll.find_one(query, {"geonamesid": 1})

which would correctly return 3181928 in res.

Problem lies in these places sometimes being passed in differente cases such as "BOLOGNA" and the above find would fail of course.

I have therefore read about generating a case insensitive index on placename as follows:

db.loc.createIndex({ placename: 1 }, { collation: { locale: "it", strength: 2 } })

after which I am trying to modify the above query as follows:

res = mdbcoll.find_one(query, {"geonamesid": 1}).collation(
    {
       "locale": "it",
       "strength": 2,
    }  # find on placename case insensitive index
)

but this fails with the following error:

AttributeError: 'NoneType' object has no attribute 'collation'

2

Answers


  1. When using the find_one() method you will only get a single document, which is not a cursor, where the collation method lies. But you could pass a collation as a parameter, the same as in find.

    Then, here you’re not even getting your result, as your find_one() returns None. As the error states. I don’t know why right now though.

    Finally, an easier way to find things in a case-insensitive way is to pass the query a regex instead of a string:

    import re
    place_regexp = re.compile(r"bologna", re.I)
    query = {"placename": place_regexp}
    ...
    

    re.compile() enables you to reuse the pattern
    And re.I is the case insensitive flag

    Login or Signup to reply.
  2. If you create a full text index you can then use the $text operator to perform a case insensitve search. You don’t need to add any collation to the query, although you may want to add a $language to the index.

    One thing to be aware of:

    A collection can have at most one text index.

    Regex search as per @Er…’s answer will work too but will perform poorly on larger collections as the field is not indexed.

    This example shows how it could work. You only need to create the index as a one-off activity.

    from pymongo import MongoClient, TEXT
    
    db = MongoClient()['mydatabase']
    mdbcoll = db['mdbcoll']
    mdbcoll.create_index([('placename', TEXT)]) # Only needed once - can be removed once the index is created
    mdbcoll.replace_one({'_id': 1}, {
        "_id": 1,
        "placename": "Bologna",
        "geonamesid": 3181928,
    }, upsert=True)
    
    placename = "BOLOGNA"
    query = {'$text': {'$search': placename}}
    res = mdbcoll.find_one(query, {"geonamesid": 1})
    print(res)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search