I have a Dataframe containing transactions from an account and I need to categorize them and sum each category. I have 8 categories in total and can have 10+ keywords in each. I have found posts on this site that accomplish something similar but they have their caveats.
df['Contains string'] = (df[columns].apply(lambda x: x.str.contains('myString'))).sum(axis=1)
This is the closest I can find but it doesn’t allow for multiple keywords or categories.
Examples of categories:
Phone payment - contains string "ATT Payment" Supplies - contains string "EBAY" or "BIZCARD" or "AMAZON" or etc.
The DataFrame can have unlimited rows but a sample looks like:
TransactionID Debit Credit
0 ATT Payment ID 001 0.00 21.29
1 BIZCARD PAYMENT 0.00 124.93
2 Check #1867 8755.50 0.00
3 Check #1874 110.66 0.00
5 EBAY ID 544 0.00 36.00
6 AMAZONPAY 788 0.00 373.20
The credit column would be the only that needs values to be summed.
Desired Output:
Phone Payment = 21.29
Supplies = 534.13
Etc. = 00
New to Python. Pardon the lack or surplus of information. Thanks.
2
Answers
Use pandas.Series.str.contains
.contains()
is then used for Indexing and selecting data~
, which isnot
, for theetc
data.You can start by adding a category column as described in this answer (use .str.contains instead of .isin)
Then you can do the sum using a groupby :