numpy delete shape of passed value error

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



numpy delete shape of passed value error



I'm trying to do a very simple delete on a numpy dataset using


dataset = pd.read_csv('putty.log', sep='s+', header = 0)
badData = np.argwhere(np.isnan(dataset.loc[:,'Temp']))
np.delete(dataset, badData, 0)



but I get an error saying


ValueError: Shape of passed values is (8, 529292), indices imply (8, 536668)



Even if I simply do


np.delete(dataset, 14, 0)



I get


'ValueError: Shape of passed values is (8, 536667), indices imply (8, 536668)'



Of course 536667 should be the size of the new array, so what's the problem?


dataset.head(5)
count Fx Fy ... AngX AngY Temp
0 151 -342818.906 -13860.325 ... 1040 1052.0 176.0
1 152 -342869.781 -13268.041 ... 1039 1051.0 176.0
2 153 -343521.312 -13044.709 ... 1043 1053.0 176.0
3 154 -343697.343 -13502.697 ... 1040 1052.0 176.0
4 155 -343553.468 -13164.850 ... 1040 1052.0 176.0
[5 rows x 8 columns]





Could you add an example of your dataset using dataset.head(5) ?
– klaus
Aug 11 at 12:17





dataset example added
– summershoe
Aug 11 at 22:12




1 Answer
1



The problem is that you are trying to use numpy delete in a pandas dataframe.



You can convert your dataset to numpy, delete and put it back into a dataframe, or remove the rows using an existing pandas function that does that.



Option 1: Converting to numpy and then back to dataframe



Simple example using random values and deleting row of index 3


>>> df
count Fx Fy A B AngX AngY Temp
0 0.835154 0.399818 0.813946 0.828186 0.418237 0.431655 0.114101 0.686881
1 0.882480 0.363054 0.298512 0.179800 0.689665 0.018929 0.477470 0.088163
2 0.217667 0.511877 0.283514 0.541611 0.748867 0.173256 0.738801 0.359404
3 0.820754 0.598249 0.361888 0.461686 0.027692 0.160760 0.322443 0.687293
4 0.666681 0.423966 0.613454 0.468823 0.171541 0.487825 0.825111 0.413490
>>> np_values = df.values
>>> np_new_values = np.delete(np_values, 3, 0)
>>> df = pd.DataFrame(np_new_values, columns=['count', 'Fx', 'Fy', 'A', 'B', 'AngX', 'AngY', 'Temp'])
>>> df
count Fx Fy A B AngX AngY Temp
0 0.835154 0.399818 0.813946 0.828186 0.418237 0.431655 0.114101 0.686881
1 0.882480 0.363054 0.298512 0.179800 0.689665 0.018929 0.477470 0.088163
2 0.217667 0.511877 0.283514 0.541611 0.748867 0.173256 0.738801 0.359404
3 0.666681 0.423966 0.613454 0.468823 0.171541 0.487825 0.825111 0.413490
>>>



Option 2: Filtering the dataframe



Assume you want to remove the rows where Temp is Nan. You can filter the rows and create a new dataset, as simple as that:


>>> df
count Fx Fy A B AngX AngY Temp
0 0.320627 0.757144 0.633840 0.481710 0.553908 0.439086 0.745160 0.022574
1 0.029232 0.285503 0.832308 0.269803 0.367305 0.558367 0.811343 NaN
2 0.311669 0.958565 0.159508 0.642381 0.930498 0.738135 0.255059 0.109702
3 0.576281 0.686696 0.419363 0.914394 0.825495 0.999091 0.126657 0.731871
4 0.323572 0.186353 0.149007 0.436962 0.699664 0.910051 0.118339 0.070458
>>> df[df['Temp'].notnull()]
count Fx Fy A B AngX AngY Temp
0 0.320627 0.757144 0.633840 0.481710 0.553908 0.439086 0.745160 0.022574
2 0.311669 0.958565 0.159508 0.642381 0.930498 0.738135 0.255059 0.109702
3 0.576281 0.686696 0.419363 0.914394 0.825495 0.999091 0.126657 0.731871
4 0.323572 0.186353 0.149007 0.436962 0.699664 0.910051 0.118339 0.070458





Brilliant! Thanks @klaus. I didn't realize they weren't compatible. And thanks for giving me a couple of options. Using the pandas function was the way to go in this situation.
– summershoe
Aug 12 at 9:27






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

Firebase Auth - with Email and Password - Check user already registered