get unique combination values of a correlation matrix - pandas

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



get unique combination values of a correlation matrix - pandas



Let's suppose I have a correlation matrix that looks like this:


df = pd.DataFrame(data='a':[1,0.2,0.3,0.4],'b':[0.2,1,0.5,0.6],'c':[0.3,0.5,1,0.7],'d':[0.4,0.6,0.7,1], index=['a','b','c','d'])



what is the best way to extract the unique values of each pairwise combination (a-b, a-c, etc)?


df2 =
a_b a_c a_d b_c b_d c_d
0.2 0.3 0.4 0.5 0.6 0.7



the only way I see doing this is to write my own function, but was wondering if someone knows a shortcut for this




2 Answers
2



IIUC:


df_out = df.stack()
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T



Output:


a_a a_b a_c a_d b_a b_b b_c b_d c_a c_b c_c c_d d_a d_b d_c
0 1.0 0.2 0.3 0.4 0.2 1.0 0.5 0.6 0.3 0.5 1.0 0.7 0.4 0.6 0.7



And, if you want to get rid of a_a, b_b, etc..


df_out = df.stack()
df_out = df_out[df_out.index.get_level_values(0) != df_out.index.get_level_values(1)]
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T



Output


a_b a_c a_d b_a b_c b_d c_a c_b c_d d_a d_b d_c
0 0.2 0.3 0.4 0.2 0.5 0.6 0.3 0.5 0.7 0.4 0.6 0.7



Or to get rid of b_a and keep a_b:


df_out = df.stack()
df_out = df_out[df_out.index.get_level_values(0) < df_out.index.get_level_values(1)]
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T



Or combining a few lines using lambda function in .loc:


.loc


df_out = df.stack().loc[lambda x: x.index.get_level_values(0) < x.index.get_level_values(1)]
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T



Output:


a_b a_c a_d b_c b_d c_d
0 0.2 0.3 0.4 0.5 0.6 0.7





Thanks, that works great! I knew that there would be a more elegant way to do this than writing my own function. Thanks a lot!
– HappyPy
Aug 9 at 19:44





@HappyPy Thank you. You're welcome. Happy coding!
– Scott Boston
Aug 9 at 19:46



IIUC, you can play with indexes


df2 = df.unstack().reset_index()
s = df2[['level_0', 'level_1']].agg(frozenset,1).drop_duplicates()
df2 = df2.loc[s.index]
ind = df2.agg(lambda k: (k['level_0']+'_'+k['level_1']), axis=1)
df2.set_index(ind)[0].to_frame().T

a_a a_b a_c a_d b_b b_c b_d c_c c_d d_d
0 1.0 0.2 0.3 0.4 1.0 0.5 0.6 1.0 0.7 1.0






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

Firebase Auth - with Email and Password - Check user already registered