I have a mongo collection that stores city/country data in multiple languages. For example, the following query:
db.cities_database.find({ "name.pl.country": "Węgry" }).pretty().limit(10);
Returns data in the following format:
[
{
_id: ObjectId('67331d2a9566994a18c505aa'),
geoname_id_city: 714073,
latitude: 46.91667,
longitude: 21.26667,
geohash: 'u2r4guvvmm4m',
country_code: 'HU',
population: 7494,
estimated_radius: 400,
feature_code: 'PPL',
name: {
pl: { city: 'Veszto', admin1: null, country: 'Węgry' },
ascii: { city: 'veszto', admin1: null, country: null },
lt: { city: 'Veszto', admin1: null, country: 'Vengrija' },
ru: { city: 'Veszto', admin1: null, country: 'Венгрия' },
hu: { city: 'Veszto', admin1: null, country: 'Magyarország' },
en: { city: 'Veszto', admin1: null, country: 'Hungary' },
fr: { city: 'Veszto', admin1: null, country: 'Hongrie' }
}
}
...
]
I want to be able to use the same query while using English only characters, so for this example I’d like to query by "name.pl.country": "Wegry"
(Instead character ę
I’d like Mongo to treat it as e
while performing this query).
Is it possible to achieve this?
So far I tried using collation like this:
db.cities_database.find({ "name.pl.country": "Wegry" }).collation({ locale: "pl", strength: 1 }).pretty().limit(10);
but this query doesn’t return anything.
2
Answers
I have no knowledge in Polish and I don’t know the difference between
e
andę
. But if you use MongoDB Altas, you can set up a customAnalyzer withicuFolding
to perform diacritics-insensitive search.The index:
$search
query:MongoDB Atlas search playground
I think that’s the way how the polish collation is defined, see Polish CLDR chart.
ę Ę are black, I guess that means "must match exactly".
Other characters (e.g. é É è È ê Ê ë Ë) are grey, for them it works:
When you query them it gives
Maybe you are looking for