get unique combination values of a correlation matrix - pandas

Clash Royale CLAN TAG#URR8PPP
get unique combination values of a correlation matrix - pandas
Let's suppose I have a correlation matrix that looks like this:
df = pd.DataFrame(data='a':[1,0.2,0.3,0.4],'b':[0.2,1,0.5,0.6],'c':[0.3,0.5,1,0.7],'d':[0.4,0.6,0.7,1], index=['a','b','c','d'])
what is the best way to extract the unique values of each pairwise combination (a-b, a-c, etc)?
df2 =
a_b a_c a_d b_c b_d c_d
0.2 0.3 0.4 0.5 0.6 0.7
the only way I see doing this is to write my own function, but was wondering if someone knows a shortcut for this
2 Answers
2
IIUC:
df_out = df.stack()
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T
Output:
a_a a_b a_c a_d b_a b_b b_c b_d c_a c_b c_c c_d d_a d_b d_c
0 1.0 0.2 0.3 0.4 0.2 1.0 0.5 0.6 0.3 0.5 1.0 0.7 0.4 0.6 0.7
And, if you want to get rid of a_a, b_b, etc..
df_out = df.stack()
df_out = df_out[df_out.index.get_level_values(0) != df_out.index.get_level_values(1)]
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T
Output
a_b a_c a_d b_a b_c b_d c_a c_b c_d d_a d_b d_c
0 0.2 0.3 0.4 0.2 0.5 0.6 0.3 0.5 0.7 0.4 0.6 0.7
Or to get rid of b_a and keep a_b:
df_out = df.stack()
df_out = df_out[df_out.index.get_level_values(0) < df_out.index.get_level_values(1)]
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T
Or combining a few lines using lambda function in .loc:
.loc
df_out = df.stack().loc[lambda x: x.index.get_level_values(0) < x.index.get_level_values(1)]
df_out.index = df_out.index.map('_'.join)
df_out = df_out.to_frame().T
Output:
a_b a_c a_d b_c b_d c_d
0 0.2 0.3 0.4 0.5 0.6 0.7
@HappyPy Thank you. You're welcome. Happy coding!
– Scott Boston
Aug 9 at 19:46
IIUC, you can play with indexes
df2 = df.unstack().reset_index()
s = df2[['level_0', 'level_1']].agg(frozenset,1).drop_duplicates()
df2 = df2.loc[s.index]
ind = df2.agg(lambda k: (k['level_0']+'_'+k['level_1']), axis=1)
df2.set_index(ind)[0].to_frame().T
a_a a_b a_c a_d b_b b_c b_d c_c c_d d_d
0 1.0 0.2 0.3 0.4 1.0 0.5 0.6 1.0 0.7 1.0
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks, that works great! I knew that there would be a more elegant way to do this than writing my own function. Thanks a lot!
– HappyPy
Aug 9 at 19:44