How to sort to know sort data by completeness rate level on pandas

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to sort to know sort data by completeness rate level on pandas



Here's my dataset


id feature_1 feature_2 feature_3 feature_4 feature_5
1 10 15 10 15 20
2 10 NaN 10 NaN 20
3 10 NaN 10 NaN 20
4 10 46 NaN 23 20
5 10 NaN 10 NaN 20



Here's what I need, I want to sort data based on completeness level (the higher percentage data is not nan, is higher the completeness level) of the dataset, the str is will be ascending so make me easier impute the missing value


id feature_1 feature_2 feature_3 feature_4 feature_5
2 10 NaN 10 NaN 20
3 10 NaN 10 NaN 20
5 10 NaN 10 NaN 20
4 10 46 NaN 23 20
1 10 15 10 15 20



Best Regards,





Can you explain more completeness rate level ?
– jezrael
Aug 6 at 6:37


completeness rate level





Yes, I mean the percentage data not missing
– Nabih Bawazir
Aug 6 at 6:50




2 Answers
2



Try this:


import pandas as pd
import numpy as np

d = (
'A' : ['X',np.NaN,np.NaN,'X','Y',np.NaN,'X','X',np.NaN,'X','X'],
'B' : ['Y',np.NaN,'X','Val','X','X',np.NaN,'X','X','X','X'],
'C' : ['Y','X','X',np.NaN,'X','X','Val','X','X',np.NaN,np.NaN],
)

df = pd.DataFrame(data=d)

df.T.isnull().sum()
Out[72]:
0 0
1 2
2 1
3 1
4 0
5 1
6 1
7 0
8 1
9 1
10 1
dtype: int64

df['is_null'] = df.T.isnull().sum()

df.sort_values('is_null', ascending=False)
Out[77]:
A B C is_null
1 NaN NaN X 2
2 NaN X X 1
3 X Val NaN 1
5 NaN X X 1
6 X NaN Val 1
8 NaN X X 1
9 X X NaN 1
10 X X NaN 1
0 X Y Y 0
4 Y X X 0
7 X X X 0



If want sorting by column with maximal number of NaNs:


maximal


NaN


c = df.isnull().sum().idxmax()
print (c)
feature_2

df = df.sort_values(c, na_position='first', ascending=False)
print (df)
id feature_1 feature_2 feature_3 feature_4 feature_5
1 2 10 NaN 10.0 NaN 20
2 3 10 NaN 10.0 NaN 20
4 5 10 NaN 10.0 NaN 20
3 4 10 46.0 NaN 23.0 20
0 1 10 15.0 10.0 15.0 20





Why did U use na_position='first'? If U use sum null values as a key.
– Cezary.Sz
Aug 6 at 6:41





because na_position='last' is default value ;)
– jezrael
Aug 6 at 6:42


na_position='last'





why Id 1 come before Id 4?
– Nabih Bawazir
Aug 6 at 6:53





@NabihBawazir - I think ascending=False is missing.
– jezrael
Aug 6 at 6:56


ascending=False





I get those idea when @Cezary.Sz. If you make multilevel sorting it helps as well
– Nabih Bawazir
Aug 6 at 6:57






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard