Set 3 level of column names in pandas DataFrame
Clash Royale CLAN TAG#URR8PPP
Set 3 level of column names in pandas DataFrame
I'm trying to have a frame with the following structure
h/a totales
sub1 sub2 sub1 sub2
a b ... f g ....m a b ... f g ....m
That being, 2 labels for the first layer, again 2 labels for the second one, and then a subset of column names where sub1 and sub2 doesn't have the same column names.
In order to do so I did the following:
columnas=pd.MultiIndex.from_product([['h/a','totals'],['means','percentages'],
[('means','a'),('means','b'),....('percentage','g'),....],
names=['data level 1','data level 2','data level 3']])
data=[data,pata,......]
newframe=pd.DataFrame(data,columns=columnas)
What I get is this error:
>ValueError: Shape of passed values is (1, 21), indices imply (84, 21)
How can I fix this to have a multi leveled frame by column names?
Thank you
1 Answer
1
I think need MultiIndex.from_tuples
from list comprehensions:
MultiIndex.from_tuples
L1 = list('abc')
L2 = list('ghi')
tups = ([('h/a','means', x) for x in L1] +
[('h/a','percentage', x) for x in L2] +
[('totals','means', x) for x in L1] +
[('totals','percentage', x) for x in L2])
columnas=pd.MultiIndex.from_tuples(tups, names=['data level 1','data level 2','data level 3'])
print (columnas)
MultiIndex(levels=[['h/a', 'totals'],
['means', 'percentage'],
['a', 'b', 'c', 'g', 'h', 'i']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]],
names=['data level 1', 'data level 2', 'data level 3'])
#some random data
np.random.seed(785)
data = np.random.randint(10, size=(3, 12))
print (data)
[[8 0 4 1 2 5 4 1 4 1 1 8]
[1 5 0 7 4 8 4 1 3 8 0 2]
[5 9 4 9 4 6 3 7 0 5 2 1]]
newframe=pd.DataFrame(data,columns=columnas)
print (newframe)
data level 1 h/a totals
data level 2 means percentage means percentage
data level 3 a b c g h i a b c g h i
0 8 0 4 1 2 5 4 1 4 1 1 8
1 1 5 0 7 4 8 4 1 3 8 0 2
2 5 9 4 9 4 6 3 7 0 5 2 1
@puppet - Please check edited answer.
– jezrael
Aug 8 at 4:59
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
But if I do so, means and percentage will be parents of the same subset of column names. I used tuples because means have a subset of column names, for example, 'a','b','c', and percentage have a subset like 'd','e','f'. I will edit my post to clarify further. Thanks anyway
– puppet
Aug 7 at 20:06