skip to Main Content

I was wondering if there was an actually good way to aggregate/calculate percentiles based on all data in a database. For example, data like:

{
'name':'John'
'rank':5
},
{
'name':'John'
'rank':4
},
{
'name':'John'
'rank':5
},
{
'name':'James'
'rank':3
},
{
'name':'Froggy'
'rank':5
},

How would I go about finding a way to calculate the percentile that users achieve a specific rarity compared to others. Example: James falls in the top 10% of users who own rank 5.

2

Answers


  1. I think the formula for that would be (no_of_people_with_rank_5/total_people)*100. And honestly, it also depends about the logic in database, is it that a person with rank 5 holds all other ranks or are the ranks all separate? For this answer I will assume all ranks are separate.

    import json
    
    database = json.load(open("Yourdatabasefile.json", "r"))
    # Skip above, if you already have the data
    
    def find_userbase_percentiles(database: dict) -> dict[float]:
        # You would ideally want to call this function once.
        frequencies = {}
        for x in database:
            frequencies.setdefault(x['rank'], 0)
            frequencies[x['rank']] += 1
        # Now we have frequencies
        for k, v in frequencies.items():
            frequencies[k] = (v/len(database))*100  # Our percentile formula
        return frequencies
    
    def find_user_qualifications(user: str, rank: int, frequencies: dict) -> str:
        # Modify this function to your requirements
        return f"{user} falls in the top {int(frequencies[rank])}% of users who own rank {rank}"
    
    if __name__ == "__main__":
        freq = find_userbase_percentile(database)
        print(find_user_qualifications("James", 5, freq))
    
    Login or Signup to reply.
  2. emphasized textYou can do something like:

    db.collection.aggregate([
      {
        $group: {
          _id: "$name",
          rankFiveCount: {$sum: {$cond: [{$eq: ["$rank", 5]}, 1, 0]}}}
      },
      {
        $setWindowFields: {
          sortBy: {rankFiveCount: -1},
          output: {
            rank: {$rank: {}},
            totalCount: {$count: {}}
          }
        }
      },
      {$match: {_id: wantedUser}},
      {
        $project: {
          name: "$_id",
          _id: 0,
          percentile: {
            $round: {$multiply: [{$divide: ["$rank", "$totalCount"]}, 100]}
          }
        }
      }
    ])
    

    See how it works on the playground example

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search