Aggregate data frame rows based on conditions

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Aggregate data frame rows based on conditions



I have this table


A B C E
1 2 1 3
1 2 4 4
2 7 1 1
3 4 0 2
3 4 8 3



Now, I want to remove duplicates based on column A and B and at the same time sum up column C. For E, it should take the value where C shows the max value. The desirable result table should look like this:


A B C E
1 2 5 4
2 7 1 1
3 4 8 3



I tried this: df.groupby(['A', 'B']).sum()['C'] but my data frame does not change at all as I am thinking that I didn't incorporate the E column part properly...Can somebody advise?


df.groupby(['A', 'B']).sum()['C']



Thanks so much!





Is it the table from DB? By which field you want to group rows?
– Daniil Mashkin
Aug 10 at 6:41





I want to group by all columns except the first(pandas index) and last (non unique). At the same time the values of the last column needs to be summed, so the first data entry in aggregated form should show the value 12 in the last column (being 1 row now).
– Tina
Aug 10 at 6:43






If you are using pandas can you edit your question and add columns names there? Also would be better to change the image (from the link to a real image)
– Daniil Mashkin
Aug 10 at 6:50





@DaniilMashkin I added a better visualization. Can you review and let me know what you think?
– Tina
Aug 10 at 16:38




1 Answer
1



If the first and second rows are duplicates, we can group by them.


In [20]: df
Out[20]:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 3

In [21]: df.groupby(['A', 'B'])['C'].sum()
Out[21]:
A B
1 1 6
3 3 8
Name: C, dtype: int64



I tried this: df.groupby(['A', 'B']).sum()['C'] but my data frame does not change at all



yes, it's because pandas didn't overwrite initial DataFrame


DataFrame


In [22]: df
Out[22]:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 3



You have to overwrite it explicitly.


In [23]: df = df.groupby(['A', 'B'])['C'].sum()

In [24]: df
Out[24]:
A B
1 1 6
3 3 8
Name: C, dtype: int64





Ok, then the other incompleteness is on column E. How can this be tied in with the conditions described in my original text? Based on your table, value E in the first row should be 4 and in the 2nd row should be 3. Please advise.
– Tina
Aug 10 at 18:37






can you please advise me further?
– Tina
Aug 11 at 0:16






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard