How can I average every 5 rows specific column and select last data from another column in Pandas

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How can I average every 5 rows specific column and select last data from another column in Pandas



I have pandas df with say, 100 rows, 4 columns. I want to calculate mean in specific columns("Value") every 5 rows and select last data(Fifth) of another column("Date") to keep in new dataframe.



My dataframe that looks like this :


>>df
DateTime Product Location Value
0 12-07-2018 A S1 1.313
1 12-07-2018 B S1 3.089
2 12-07-2018 C S1 1.890
3 12-07-2018 D S1 3.136
4** 12-07-2018 E S1 3.258
5 13-07-2018 F S1 3.113
6 13-07-2018 G S1 2.651
7 13-07-2018 H S1 2.135
8 13-07-2018 I S1 1.555
9** 14-07-2018 J S1 2.009
10 14-07-2018 K S1 1.757
11 14-07-2018 L S1 1.808
12 14-07-2018 M S1 1.511
13 15-07-2018 N S1 2.265
14** 15-07-2018 O S1 2.356
15 15-07-2018 P S1 2.950
16 15-07-2018 Q S1 3.300



Now I can average every 5 rows by this code :


> new_df = df.groupby(df.index // 5).agg('DateTime':'last', 'Value':'mean')



This result of code :


>> new_df
DateTime Value
0 12-07-2018 2.5372
1 14-07-2018 2.2926
2 15-07-2018 1.9394
3 15-07-2018 3.1250



But last 2 rows was average with same. (2.950+3.300)/2 = 3.1250. If it has 1,2,3,4 row, it will average on the number of row.



I would like to average 5 rows only. If it hasn't 5 rows, don't average and send to new_df



How can I do that?



Note : I add ** for easy to observe at every 5 rows.




2 Answers
2



To my best understanding, your request is equivalent to truncating df to a length divisible by 5 before aggregating. You can use slicing on the fly:


df


new_df = df.groupby(df[:(len(df)//5)*5].index // 5).agg('DateTime':'last', 'Value':'mean')





I use your code but VSC call ValueError : Grouper and axis must be same length. I try to change code to new code is new_df = df[:(len(df)//5)*5].groupby(df[:(len(df)//5)*5].index // 5).agg('DateTime':'last', 'Value':'mean'). It can work ! Thanks many.
– kanpisek sasuk
Aug 7 at 3:20




Use:


i = df.index // 5
#compare by last value
mask = i == i[-1]
#length of last group
no = mask.sum()

#filter only if last group less as 5
no = mask.sum()
if no < 5:
df = df[~mask]



Another idea:


s = pd.Series(df.index // 5)
df = df[s.groupby(s).transform('count') == 5]


new_df = df.groupby(df.index // 5).agg('DateTime':'last', 'Value':'mean')
print (new_df)
DateTime Value
0 12-07-2018 2.5372
1 14-07-2018 2.2926
2 15-07-2018 1.9394





I forgot describe about df row at 15 16 don't send to new_df (5 rows only). I'll edit my post. Sorry sir.
– kanpisek sasuk
Aug 6 at 9:16






@kanpiseksasuk - It is different ;) check edited answer.
– jezrael
Aug 6 at 9:24





Your solution worked for me! Thank you very much. I really appreciate for your help.
– kanpisek sasuk
Aug 6 at 9:51






Can I use aggregate to select ('DateTime':'last', 'Value':'mean', 'Product':'all') for show Product A,B,C,D,E // F,G,H,I,J, when group every 5 rows?
– kanpisek sasuk
Sep 3 at 3:53






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

Creating a leaderboard in HTML/JS