Performing operations on data and adding column to groupby results

I'm trying to write a simple program to track my attandance to my two university courses this semester. I need to have at least an 80% of attendance to each class of each subject. Each subject has three classes (T,TP and P).

After tinkering for a while I've come to find the groupby pandas function:

import pandas as pd import sys df = pd.read_csv(sys.argv[1]) g = df.groupby(['Subject','Class']).count() print(g)

The source file has a row for each class where I'll note "Presente" when I go. I will also note "Sin Clases" when there is no class due to some external reason. This is what top 5 rows of the source file looks like:

Date Day Subject Class Attendance Comments 0 6/8/2018 2 Medieval T Presente NaN 1 6/8/2018 2 Logica T Presente NaN 2 6/8/2018 2 Logica P NaN NaN 3 8/8/2018 4 Logica TP Sin clases Movilización IVE 4 8/8/2018 4 Logica P NaN Movilización IVE

This is the current output. What I'm not able to figure is how to perform some kind of operation and add a column. For example, adding a column counting only those rows with value "Presente" for each subgroup. I find myself complicating things too much, i'm afrading i'm missing something.

z:devspyasist>python asistb.py asist2q.csv Date Day Attendance Comments Subject Class Logica P 32 32 3 4 T 16 16 4 3 TP 16 16 1 1 Medieval P 16 16 0 0 T 32 32 5 4 TP 16 16 1 1

A tip or hint on how to approach the problem is greatly appreciated. Thanks a lot!

2 Answers
2

Seems like you need unstack + stack

unstack

stack

df.groupby(['Subject','Class']).count().unstack(fill_value=0).stack() Out[296]: Date Day Attendance Comments Subject Class Logica P 2 2 0 1 T 1 1 1 0 TP 1 1 1 1 Medieval P 0 0 0 0 T 1 1 1 0 TP 0 0 0 0

Given the example df as follows

df

Date Day Subject Class Attendance Comments 0 6/8/2018 2 Medieval T Presente NaN 1 6/8/2018 2 Logica T Presente NaN 2 6/8/2018 2 Logica P NaN NaN 3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE 4 8/8/2018 4 Logica P NaN MovilizaciónIVE

In order to count the occurances of Presente from the Attendance column, you use a transform method with string search and count operation and add the result in a new column as follows

Presente

Attendance

transform

string search

count

df['counts'] = df.groupby(['Subject', 'Class'])['Attendance'].transform( lambda x: x[x.str.contains('Presente', na=False)].count())

This results the following output

Date Day Subject Class Attendance Comments counts 0 6/8/2018 2 Medieval T Presente NaN 1 1 6/8/2018 2 Logica T Presente NaN 1 2 6/8/2018 2 Logica P NaN NaN 0 3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE 0 4 8/8/2018 4 Logica P NaN MovilizaciónIVE 0

To calculate the total number of Presente per Subject, you can use groupby again and sum up the counts column

groupby

df.groupby(['Subject'])['counts'].sum()

The output is

Subject Logica 1 Medieval 1 Name: counts, dtype: int32

Source:pandas groupby count string occurrence over column

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Performing operations on data and adding column to groupby results

Performing operations on data and adding column to groupby results

2 Answers
2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

Performing operations on data and adding column to groupby results

Performing operations on data and adding column to groupby results

2 Answers 2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

2 Answers
2