I have a MongoDB collection with around 20K articles with an "articledate" string field with dates in the "dd/mm/YYYY" format.
Want to query the collection and get the sum of the count of articles in each mm/YYYY.
This is the pipeline I have written which should also handle the date format:
# Aggregation pipeline to group documents by year and month and count them
pipeline = [
{"$match": {"articledatepub": {"$exists": True}}},
{
"$addFields": {
"date_object": {
"$dateFromString": {"dateString": "$articledatepub", "format": "%m/%d/%Y"}
}
}
},
{
"$group": {
"_id": {
"year": {"$year": "$date_object"},
"month": {"$month": "$date_object"},
},
"count": {"$sum": 1},
}
},
{"$sort": {"_id.year": 1, "_id.month": 1}},
]
but when I execute it the result is empty:
# Execute the aggregation pipeline
result = list(articles.aggregate(pipeline))
Here is a small fictional sample of the data:
{
"_id": {
"$oid": "641dddcd685f9e1cbbc73490"
},
"articleid": 2003287422,
"articledatepub": "23/03/2023"
}
{
"_id": {
"$oid": "641dddcd685f9e1cccc73490"
},
"articleid": 2003283422,
"articledatepub": "22/03/2023"
}
{
"_id": {
"$oid": "641dddcd687f9e1cccc73490"
},
"articleid": 2003383422,
"articledatepub": "11/02/2023"
}
from which I’d expect a count of two articles for March 2023 and one for February 2023.
2
Answers
The code above would not work because of the DD/MM/YYYY format.
Here is the working code which parses that date format "manually" with a revised pipeline:
You just need to change the
format
parameter of$dateFromString
to"%d/%m/%Y"
. Try this: