How to sort to know sort data by completeness rate level on pandas
Clash Royale CLAN TAG#URR8PPP
How to sort to know sort data by completeness rate level on pandas
Here's my dataset
id feature_1 feature_2 feature_3 feature_4 feature_5
1 10 15 10 15 20
2 10 NaN 10 NaN 20
3 10 NaN 10 NaN 20
4 10 46 NaN 23 20
5 10 NaN 10 NaN 20
Here's what I need, I want to sort data based on completeness level (the higher percentage data is not nan, is higher the completeness level) of the dataset, the str is will be ascending so make me easier impute the missing value
id feature_1 feature_2 feature_3 feature_4 feature_5
2 10 NaN 10 NaN 20
3 10 NaN 10 NaN 20
5 10 NaN 10 NaN 20
4 10 46 NaN 23 20
1 10 15 10 15 20
Best Regards,
completeness rate level
Yes, I mean the percentage data not missing
– Nabih Bawazir
Aug 6 at 6:50
2 Answers
2
Try this:
import pandas as pd
import numpy as np
d = (
'A' : ['X',np.NaN,np.NaN,'X','Y',np.NaN,'X','X',np.NaN,'X','X'],
'B' : ['Y',np.NaN,'X','Val','X','X',np.NaN,'X','X','X','X'],
'C' : ['Y','X','X',np.NaN,'X','X','Val','X','X',np.NaN,np.NaN],
)
df = pd.DataFrame(data=d)
df.T.isnull().sum()
Out[72]:
0 0
1 2
2 1
3 1
4 0
5 1
6 1
7 0
8 1
9 1
10 1
dtype: int64
df['is_null'] = df.T.isnull().sum()
df.sort_values('is_null', ascending=False)
Out[77]:
A B C is_null
1 NaN NaN X 2
2 NaN X X 1
3 X Val NaN 1
5 NaN X X 1
6 X NaN Val 1
8 NaN X X 1
9 X X NaN 1
10 X X NaN 1
0 X Y Y 0
4 Y X X 0
7 X X X 0
If want sorting by column with maximal
number of NaN
s:
maximal
NaN
c = df.isnull().sum().idxmax()
print (c)
feature_2
df = df.sort_values(c, na_position='first', ascending=False)
print (df)
id feature_1 feature_2 feature_3 feature_4 feature_5
1 2 10 NaN 10.0 NaN 20
2 3 10 NaN 10.0 NaN 20
4 5 10 NaN 10.0 NaN 20
3 4 10 46.0 NaN 23.0 20
0 1 10 15.0 10.0 15.0 20
Why did U use na_position='first'? If U use sum null values as a key.
– Cezary.Sz
Aug 6 at 6:41
because
na_position='last'
is default value ;)– jezrael
Aug 6 at 6:42
na_position='last'
why Id 1 come before Id 4?
– Nabih Bawazir
Aug 6 at 6:53
@NabihBawazir - I think
ascending=False
is missing.– jezrael
Aug 6 at 6:56
ascending=False
I get those idea when @Cezary.Sz. If you make multilevel sorting it helps as well
– Nabih Bawazir
Aug 6 at 6:57
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Can you explain more
completeness rate level
?– jezrael
Aug 6 at 6:37