Set 3 level of column names in pandas DataFrame

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Set 3 level of column names in pandas DataFrame



I'm trying to have a frame with the following structure


h/a totales

sub1 sub2 sub1 sub2
a b ... f g ....m a b ... f g ....m



That being, 2 labels for the first layer, again 2 labels for the second one, and then a subset of column names where sub1 and sub2 doesn't have the same column names.



In order to do so I did the following:


columnas=pd.MultiIndex.from_product([['h/a','totals'],['means','percentages'],
[('means','a'),('means','b'),....('percentage','g'),....],
names=['data level 1','data level 2','data level 3']])

data=[data,pata,......]
newframe=pd.DataFrame(data,columns=columnas)



What I get is this error:


>ValueError: Shape of passed values is (1, 21), indices imply (84, 21)



How can I fix this to have a multi leveled frame by column names?



Thank you




1 Answer
1



I think need MultiIndex.from_tuples from list comprehensions:


MultiIndex.from_tuples


L1 = list('abc')
L2 = list('ghi')

tups = ([('h/a','means', x) for x in L1] +
[('h/a','percentage', x) for x in L2] +
[('totals','means', x) for x in L1] +
[('totals','percentage', x) for x in L2])

columnas=pd.MultiIndex.from_tuples(tups, names=['data level 1','data level 2','data level 3'])
print (columnas)
MultiIndex(levels=[['h/a', 'totals'],
['means', 'percentage'],
['a', 'b', 'c', 'g', 'h', 'i']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]],
names=['data level 1', 'data level 2', 'data level 3'])

#some random data
np.random.seed(785)
data = np.random.randint(10, size=(3, 12))
print (data)
[[8 0 4 1 2 5 4 1 4 1 1 8]
[1 5 0 7 4 8 4 1 3 8 0 2]
[5 9 4 9 4 6 3 7 0 5 2 1]]

newframe=pd.DataFrame(data,columns=columnas)
print (newframe)
data level 1 h/a totals
data level 2 means percentage means percentage
data level 3 a b c g h i a b c g h i
0 8 0 4 1 2 5 4 1 4 1 1 8
1 1 5 0 7 4 8 4 1 3 8 0 2
2 5 9 4 9 4 6 3 7 0 5 2 1





But if I do so, means and percentage will be parents of the same subset of column names. I used tuples because means have a subset of column names, for example, 'a','b','c', and percentage have a subset like 'd','e','f'. I will edit my post to clarify further. Thanks anyway
– puppet
Aug 7 at 20:06





@puppet - Please check edited answer.
– jezrael
Aug 8 at 4:59






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard