numpy delete shape of passed value error

Clash Royale CLAN TAG#URR8PPP
numpy delete shape of passed value error
I'm trying to do a very simple delete on a numpy dataset using
dataset = pd.read_csv('putty.log', sep='s+', header = 0)
badData = np.argwhere(np.isnan(dataset.loc[:,'Temp']))
np.delete(dataset, badData, 0)
but I get an error saying
ValueError: Shape of passed values is (8, 529292), indices imply (8, 536668)
Even if I simply do
np.delete(dataset, 14, 0)
I get
'ValueError: Shape of passed values is (8, 536667), indices imply (8, 536668)'
Of course 536667 should be the size of the new array, so what's the problem?
dataset.head(5)
count Fx Fy ... AngX AngY Temp
0 151 -342818.906 -13860.325 ... 1040 1052.0 176.0
1 152 -342869.781 -13268.041 ... 1039 1051.0 176.0
2 153 -343521.312 -13044.709 ... 1043 1053.0 176.0
3 154 -343697.343 -13502.697 ... 1040 1052.0 176.0
4 155 -343553.468 -13164.850 ... 1040 1052.0 176.0
[5 rows x 8 columns]
dataset example added
– summershoe
Aug 11 at 22:12
1 Answer
1
The problem is that you are trying to use numpy delete in a pandas dataframe.
You can convert your dataset to numpy, delete and put it back into a dataframe, or remove the rows using an existing pandas function that does that.
Option 1: Converting to numpy and then back to dataframe
Simple example using random values and deleting row of index 3
>>> df
count Fx Fy A B AngX AngY Temp
0 0.835154 0.399818 0.813946 0.828186 0.418237 0.431655 0.114101 0.686881
1 0.882480 0.363054 0.298512 0.179800 0.689665 0.018929 0.477470 0.088163
2 0.217667 0.511877 0.283514 0.541611 0.748867 0.173256 0.738801 0.359404
3 0.820754 0.598249 0.361888 0.461686 0.027692 0.160760 0.322443 0.687293
4 0.666681 0.423966 0.613454 0.468823 0.171541 0.487825 0.825111 0.413490
>>> np_values = df.values
>>> np_new_values = np.delete(np_values, 3, 0)
>>> df = pd.DataFrame(np_new_values, columns=['count', 'Fx', 'Fy', 'A', 'B', 'AngX', 'AngY', 'Temp'])
>>> df
count Fx Fy A B AngX AngY Temp
0 0.835154 0.399818 0.813946 0.828186 0.418237 0.431655 0.114101 0.686881
1 0.882480 0.363054 0.298512 0.179800 0.689665 0.018929 0.477470 0.088163
2 0.217667 0.511877 0.283514 0.541611 0.748867 0.173256 0.738801 0.359404
3 0.666681 0.423966 0.613454 0.468823 0.171541 0.487825 0.825111 0.413490
>>>
Option 2: Filtering the dataframe
Assume you want to remove the rows where Temp is Nan. You can filter the rows and create a new dataset, as simple as that:
>>> df
count Fx Fy A B AngX AngY Temp
0 0.320627 0.757144 0.633840 0.481710 0.553908 0.439086 0.745160 0.022574
1 0.029232 0.285503 0.832308 0.269803 0.367305 0.558367 0.811343 NaN
2 0.311669 0.958565 0.159508 0.642381 0.930498 0.738135 0.255059 0.109702
3 0.576281 0.686696 0.419363 0.914394 0.825495 0.999091 0.126657 0.731871
4 0.323572 0.186353 0.149007 0.436962 0.699664 0.910051 0.118339 0.070458
>>> df[df['Temp'].notnull()]
count Fx Fy A B AngX AngY Temp
0 0.320627 0.757144 0.633840 0.481710 0.553908 0.439086 0.745160 0.022574
2 0.311669 0.958565 0.159508 0.642381 0.930498 0.738135 0.255059 0.109702
3 0.576281 0.686696 0.419363 0.914394 0.825495 0.999091 0.126657 0.731871
4 0.323572 0.186353 0.149007 0.436962 0.699664 0.910051 0.118339 0.070458
Brilliant! Thanks @klaus. I didn't realize they weren't compatible. And thanks for giving me a couple of options. Using the pandas function was the way to go in this situation.
– summershoe
Aug 12 at 9:27
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Could you add an example of your dataset using dataset.head(5) ?
– klaus
Aug 11 at 12:17