skip to Main Content

I have a data frame that was taken from a survey. I have two columns: the number ID of the respondent and the database they worked with.

    Respondent  DatabaseWorkedWith
0   4   MySQL
1   4   SQLite
2   9   DynamoDB
3   9   PostgreSQL
4   9   SQLite
... ... ...
31370   25137   MySQL
31371   25138   PostgreSQL
31372   25141   Microsoft SQL Server
31373   25141   Redis
31374   25142   PostgreSQL

31375 rows × 2 columns

The question is on the title. I would like to count the number of respondents that only have MySQL on the second column. I’m not sure if I should use unique, groupby or another method.

Thanks!

2

Answers


  1. Chosen as BEST ANSWER

    I came up with a solution that may not be the most practical one, but it worked.

    First, I created a data frame summing the column 'DatabaseWorkedWith':

    resp_sum = df.groupby('Respondent').sum()
    
    Respondent DatabaseWorkedWith
    4   MySQLSQLite
    9   DynamoDBPostgreSQLSQLite
    13  CouchbaseDynamoDBFirebaseMySQL
    16  MongoDBMicrosoft SQL ServerMySQL
    17  MongoDBPostgreSQL
    ... ...
    25136   Microsoft SQL Server
    25137   MySQL
    25138   PostgreSQL
    25141   Microsoft SQL ServerRedis
    25142   PostgreSQL
    
    10945 rows × 1 column
    

    Using the new data frame, I counted by the column 'DatabaseWorkedWith' where the value was 'MySQL.'

    resp_sum[resp_sum['DatabaseWorkedWith']=='MySQL'].count()
    
    DatabaseWorkedWith    474
    dtype: int64
    

    If you guys could present a more elegant solution, I would appreciate it! Thanks!


  2. I have a solution which can do the work.
    You can drop all the rows with duplicated values in Column ‘Respondent’, then filter the DatabaseWorkedWith with MySQL.

    Here is the code

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search