Performing operations on data and adding column to groupby results
Clash Royale CLAN TAG#URR8PPP
Performing operations on data and adding column to groupby results
I'm trying to write a simple program to track my attandance to my two university courses this semester. I need to have at least an 80% of attendance to each class of each subject. Each subject has three classes (T,TP and P).
After tinkering for a while I've come to find the groupby pandas function:
import pandas as pd
import sys
df = pd.read_csv(sys.argv[1])
g = df.groupby(['Subject','Class']).count()
print(g)
The source file has a row for each class where I'll note "Presente" when I go. I will also note "Sin Clases" when there is no class due to some external reason. This is what top 5 rows of the source file looks like:
Date Day Subject Class Attendance Comments
0 6/8/2018 2 Medieval T Presente NaN
1 6/8/2018 2 Logica T Presente NaN
2 6/8/2018 2 Logica P NaN NaN
3 8/8/2018 4 Logica TP Sin clases Movilización IVE
4 8/8/2018 4 Logica P NaN Movilización IVE
This is the current output. What I'm not able to figure is how to perform some kind of operation and add a column. For example, adding a column counting only those rows with value "Presente" for each subgroup. I find myself complicating things too much, i'm afrading i'm missing something.
z:devspyasist>python asistb.py asist2q.csv
Date Day Attendance Comments
Subject Class
Logica P 32 32 3 4
T 16 16 4 3
TP 16 16 1 1
Medieval P 16 16 0 0
T 32 32 5 4
TP 16 16 1 1
A tip or hint on how to approach the problem is greatly appreciated. Thanks a lot!
2 Answers
2
Seems like you need unstack
+ stack
unstack
stack
df.groupby(['Subject','Class']).count().unstack(fill_value=0).stack()
Out[296]:
Date Day Attendance Comments
Subject Class
Logica P 2 2 0 1
T 1 1 1 0
TP 1 1 1 1
Medieval P 0 0 0 0
T 1 1 1 0
TP 0 0 0 0
Given the example df
as follows
df
Date Day Subject Class Attendance Comments
0 6/8/2018 2 Medieval T Presente NaN
1 6/8/2018 2 Logica T Presente NaN
2 6/8/2018 2 Logica P NaN NaN
3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE
4 8/8/2018 4 Logica P NaN MovilizaciónIVE
In order to count the occurances of Presente
from the Attendance
column, you use a transform
method with string search
and count
operation and add the result in a new column as follows
Presente
Attendance
transform
string search
count
df['counts'] = df.groupby(['Subject', 'Class'])['Attendance'].transform(
lambda x: x[x.str.contains('Presente', na=False)].count())
This results the following output
Date Day Subject Class Attendance Comments counts
0 6/8/2018 2 Medieval T Presente NaN 1
1 6/8/2018 2 Logica T Presente NaN 1
2 6/8/2018 2 Logica P NaN NaN 0
3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE 0
4 8/8/2018 4 Logica P NaN MovilizaciónIVE 0
To calculate the total number of Presente per Subject, you can use groupby
again and sum up the counts column
groupby
df.groupby(['Subject'])['counts'].sum()
The output is
Subject
Logica 1
Medieval 1
Name: counts, dtype: int32
Source:pandas groupby count string occurrence over column
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.