How can I average every 5 rows specific column and select last data from another column in Pandas

I have pandas df with say, 100 rows, 4 columns. I want to calculate mean in specific columns("Value") every 5 rows and select last data(Fifth) of another column("Date") to keep in new dataframe.

My dataframe that looks like this :

>>df DateTime Product Location Value 0 12-07-2018 A S1 1.313 1 12-07-2018 B S1 3.089 2 12-07-2018 C S1 1.890 3 12-07-2018 D S1 3.136 4** 12-07-2018 E S1 3.258 5 13-07-2018 F S1 3.113 6 13-07-2018 G S1 2.651 7 13-07-2018 H S1 2.135 8 13-07-2018 I S1 1.555 9** 14-07-2018 J S1 2.009 10 14-07-2018 K S1 1.757 11 14-07-2018 L S1 1.808 12 14-07-2018 M S1 1.511 13 15-07-2018 N S1 2.265 14** 15-07-2018 O S1 2.356 15 15-07-2018 P S1 2.950 16 15-07-2018 Q S1 3.300

Now I can average every 5 rows by this code :

> new_df = df.groupby(df.index // 5).agg('DateTime':'last', 'Value':'mean')

This result of code :

>> new_df DateTime Value 0 12-07-2018 2.5372 1 14-07-2018 2.2926 2 15-07-2018 1.9394 3 15-07-2018 3.1250

But last 2 rows was average with same. (2.950+3.300)/2 = 3.1250. If it has 1,2,3,4 row, it will average on the number of row.

I would like to average 5 rows only. If it hasn't 5 rows, don't average and send to new_df

How can I do that?

Note : I add ** for easy to observe at every 5 rows.

2 Answers
2

To my best understanding, your request is equivalent to truncating df to a length divisible by 5 before aggregating. You can use slicing on the fly:

df

new_df = df.groupby(df[:(len(df)//5)*5].index // 5).agg('DateTime':'last', 'Value':'mean')

I use your code but VSC call ValueError : Grouper and axis must be same length. I try to change code to new code is new_df = df[:(len(df)//5)*5].groupby(df[:(len(df)//5)*5].index // 5).agg('DateTime':'last', 'Value':'mean'). It can work ! Thanks many.
– kanpisek sasuk
Aug 7 at 3:20

Use:

i = df.index // 5 #compare by last value mask = i == i[-1] #length of last group no = mask.sum() #filter only if last group less as 5 no = mask.sum() if no < 5: df = df[~mask]

Another idea:

s = pd.Series(df.index // 5) df = df[s.groupby(s).transform('count') == 5]

new_df = df.groupby(df.index // 5).agg('DateTime':'last', 'Value':'mean') print (new_df) DateTime Value 0 12-07-2018 2.5372 1 14-07-2018 2.2926 2 15-07-2018 1.9394

I forgot describe about df row at 15 16 don't send to new_df (5 rows only). I'll edit my post. Sorry sir.
– kanpisek sasuk
Aug 6 at 9:16

@kanpiseksasuk - It is different ;) check edited answer.
– jezrael
Aug 6 at 9:24

Your solution worked for me! Thank you very much. I really appreciate for your help.
– kanpisek sasuk
Aug 6 at 9:51

Can I use aggregate to select ('DateTime':'last', 'Value':'mean', 'Product':'all') for show Product A,B,C,D,E // F,G,H,I,J, when group every 5 rows?
– kanpisek sasuk
Sep 3 at 3:53

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

How can I average every 5 rows specific column and select last data from another column in Pandas

How can I average every 5 rows specific column and select last data from another column in Pandas

2 Answers
2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

How can I average every 5 rows specific column and select last data from another column in Pandas

How can I average every 5 rows specific column and select last data from another column in Pandas

2 Answers 2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

2 Answers
2