I have the following list of tuples.
[('0', 'Hadoop'), ('0', 'Big Data'), ('0', 'HBas'), ('0', 'Java'), ('0', 'Spark'), ('0', 'Storm'), ('0', 'Cassandra'), ('1', 'NoSQL'), ('1', 'MongoDB'), ('1', 'Cassandra'), ('1', 'HBase'), ('1', 'Postgres'), ('2', 'Python'), ('2', 'skikit-learn'), ('2', 'scipy'), ('2', 'numpy'), ('2', 'statsmodels'), ('2', 'pandas'), ('3', 'R'), ('3', 'Python'), ('3', 'statistics'), ('3', 'regression'), ('3', 'probability'), ('4', 'machine learning'), ('4', 'regression'), ('4', 'decision trees'), ('4', 'libsvm'), ('5', 'Python'), ('5', 'R'), ('5', 'Java'), ('5', 'C++'), ('5', 'Haskell'), ('5', 'programming languages'), ('6', 'statistics'), ('6', 'probability'), ('6', 'mathematics'), ('6', 'theory'), ('7', 'machine learning'), ('7', 'scikit-learn'), ('7', 'Mahout'), ('7', 'neural networks'), ('8', 'neural networks'), ('8', 'deep learning'), ('8', 'Big Data'), ('8', 'artificial intelligence'), ('9', 'Hadoop'), ('9', 'Java'), ('9', 'MapReduce'), ('9', 'Big Data')]
The values on the left are “employee id numbers” while the values on the right are “interests”. I have to turn these into dictionaries in two different ways: I have to make the employee id number the key and the interests the value, then I have to make the interests the key and the employee id number the value. Basically, as a quick example, I need one of the elements of my end result to look like this:
{'0': ['Hadoop', 'Big Data', 'HBas', 'Java', 'Spark', 'Storm', 'Cassandra'],
'1' ... etc]}
Then the next would look like this:
{'Hadoop': [0,9]...}
I tried default dict but couldn’t seem to get it to work. Any suggestions?
7
Answers
You can use
collections.defaultdict
Ex:
Output:
collections.defaultdict
is indeed the right way to go about this. Create one for each dictionary you want, then loop over the list and add each pair to both dictionaries.You can also do this using a
set
anddict
comprehension.This results in:
Edit
This is more efficient if using itertools groupby.
How about
pandas
?Outcome:
Another approach would be to use
itertools.groupby
:Most pythonic and shortest code and without using imports that I can think of:
Outputs:
defaultdict
is the faster option, but you could also group withsetdefault()
with one pass through the list:Which Outputs: