Performing operations on data and adding column to groupby results

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Performing operations on data and adding column to groupby results



I'm trying to write a simple program to track my attandance to my two university courses this semester. I need to have at least an 80% of attendance to each class of each subject. Each subject has three classes (T,TP and P).



After tinkering for a while I've come to find the groupby pandas function:


import pandas as pd
import sys
df = pd.read_csv(sys.argv[1])
g = df.groupby(['Subject','Class']).count()
print(g)



The source file has a row for each class where I'll note "Presente" when I go. I will also note "Sin Clases" when there is no class due to some external reason. This is what top 5 rows of the source file looks like:


Date Day Subject Class Attendance Comments
0 6/8/2018 2 Medieval T Presente NaN
1 6/8/2018 2 Logica T Presente NaN
2 6/8/2018 2 Logica P NaN NaN
3 8/8/2018 4 Logica TP Sin clases Movilización IVE
4 8/8/2018 4 Logica P NaN Movilización IVE



This is the current output. What I'm not able to figure is how to perform some kind of operation and add a column. For example, adding a column counting only those rows with value "Presente" for each subgroup. I find myself complicating things too much, i'm afrading i'm missing something.


z:devspyasist>python asistb.py asist2q.csv
Date Day Attendance Comments
Subject Class
Logica P 32 32 3 4
T 16 16 4 3
TP 16 16 1 1
Medieval P 16 16 0 0
T 32 32 5 4
TP 16 16 1 1



A tip or hint on how to approach the problem is greatly appreciated. Thanks a lot!




2 Answers
2



Seems like you need unstack + stack


unstack


stack


df.groupby(['Subject','Class']).count().unstack(fill_value=0).stack()
Out[296]:
Date Day Attendance Comments
Subject Class
Logica P 2 2 0 1
T 1 1 1 0
TP 1 1 1 1
Medieval P 0 0 0 0
T 1 1 1 0
TP 0 0 0 0



Given the example df as follows


df


Date Day Subject Class Attendance Comments
0 6/8/2018 2 Medieval T Presente NaN
1 6/8/2018 2 Logica T Presente NaN
2 6/8/2018 2 Logica P NaN NaN
3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE
4 8/8/2018 4 Logica P NaN MovilizaciónIVE



In order to count the occurances of Presente from the Attendance column, you use a transform method with string search and count operation and add the result in a new column as follows


Presente


Attendance


transform


string search


count


df['counts'] = df.groupby(['Subject', 'Class'])['Attendance'].transform(
lambda x: x[x.str.contains('Presente', na=False)].count())



This results the following output


Date Day Subject Class Attendance Comments counts
0 6/8/2018 2 Medieval T Presente NaN 1
1 6/8/2018 2 Logica T Presente NaN 1
2 6/8/2018 2 Logica P NaN NaN 0
3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE 0
4 8/8/2018 4 Logica P NaN MovilizaciónIVE 0



To calculate the total number of Presente per Subject, you can use groupby again and sum up the counts column


groupby


df.groupby(['Subject'])['counts'].sum()



The output is


Subject
Logica 1
Medieval 1
Name: counts, dtype: int32



Source:pandas groupby count string occurrence over column






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard