Input should be as below:
company | sales |
---|---|
amazon | 100 |
flipkart | 900 |
ebay | 890 |
amazon | 100 |
flipkart | 100 |
ebay | 10 |
amazon | 100 |
flipkart | 90 |
ebay | 10 |
And expected output should be as below:
amazon | flipkart | ebay |
---|---|---|
300 | 1090 | 910 |
Tried using pivot function, but its not working. Any help on this would be appreciated.. Thanks in advance..
3
Answers
I’ve been using the following test dataset to compose this solution:
This will result in the following dataframe:
For your specific wide table I would do the following:
Spark will save keep the null values. The function
pivot
is really expensive on its own.To really get your result you can do the following for removing the
null
values by grouping and summing again.Result is:
Use groupBy() and first() function.
Thanks to @andy, for the helper column solution
The key column can be removed using drop() function
You can use
pivot()
function as shown in other answers.Here is the shorter version of
groupBy()
andpivot()
combination.