Cython: Why do NumPy arrays need to be type-cast to object?
Clash Royale CLAN TAG#URR8PPP
Cython: Why do NumPy arrays need to be type-cast to object?
I have seen something like this a few times in the Pandas source:
def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None):
# ...
N, K = (<object> mat).shape
This implies that a NumPy ndarray
called mat
is type-casted to a Python object.*
ndarray
mat
Upon further inspection, it seems like this is used because a compile error arises if it isn't. My question is: why is this type-cast required in the first place?
Here are a few examples. This answer suggest simply that tuple packing doesn't work in Cython like it does in Python---but it doesn't seem to be a tuple unpacking issue. (It is a fine answer regardless, and I don't mean to pick on it.)
Take the following script, shape.pyx
. It will fail at compile time with "Cannot convert 'npy_intp *' to Python object."
shape.pyx
from cython cimport Py_ssize_t
import numpy as np
from numpy cimport ndarray, float64_t
cimport numpy as cnp
cnp.import_array()
def test_castobj(ndarray[float64_t, ndim=2] arr):
cdef:
Py_ssize_t b1, b2
# Tuple unpacking - this will fail at compile
b1, b2 = arr.shape
return b1, b2
But again, the issue does not seem to be tuple unpacking, per se. This will fail with the same error.
def test_castobj(ndarray[float64_t, ndim=2] arr):
cdef:
# Py_ssize_t b1, b2
ndarray[float64_t, ndim=2] zeros
zeros = np.zeros(arr.shape, dtype=np.float64)
return zeros
Seemingly, no tuple unpacking is happening here. A tuple is the first arg to np.zeros
.
np.zeros
def test_castobj(ndarray[float64_t, ndim=2] arr):
"""This works"""
cdef:
Py_ssize_t b1, b2
ndarray[float64_t, ndim=2] zeros
b1, b2 = (<object> arr).shape
zeros = np.zeros((<object> arr).shape, dtype=np.float64)
return b1, b2, zeros
This also works (perhaps the most confusing of all):
def test_castobj(object[float64_t, ndim=2] arr):
cdef:
tuple shape = arr.shape
ndarray[float64_t, ndim=2] zeros
zeros = np.zeros(shape, dtype=np.float64)
return zeros
Example:
>>> from shape import test_castobj
>>> arr = np.arange(6, dtype=np.float64).reshape(2, 3)
>>> test_castobj(arr)
(2, 3, array([[0., 0., 0.],
[0., 0., 0.]]))
*Perhaps it has something to do with arr
being a memoryview? But that's a shot in the dark.
arr
Another example is in the Cython docs:
cpdef int sum3d(int[:, :, :] arr) nogil:
cdef size_t i, j, k
cdef int total = 0
I = arr.shape[0]
J = arr.shape[1]
K = arr.shape[2]
In this case, simply indexing arr.shape[i]
prevents the error, which I find strange.
arr.shape[i]
This also works:
def test_castobj(object[float64_t, ndim=2] arr):
cdef ndarray[float64_t, ndim=2] zeros
zeros = np.zeros(arr.shape, dtype=np.float64)
return zeros
c
arr.shape
1 Answer
1
You are right, it has nothing to do with the tuple-unpacking under Cython.
The reason is, that cnp.ndarray
isn't an usual numpy-array (that means a numpy-array with interface known from python), but a Cython wrapper of the numpy's C-implementation for PyArrayObject
(which is known as np.array
in Python):
cnp.ndarray
PyArrayObject
np.array
ctypedef class numpy.ndarray [object PyArrayObject]:
cdef __cythonbufferdefaults__ = "mode": "strided"
cdef:
# Only taking a few of the most commonly used and stable fields.
# One should use PyArray_* macros instead to access the C fields.
char *data
int ndim "nd"
npy_intp *shape "dimensions"
npy_intp *strides
dtype descr
PyObject* base
shape
maps in reality to dimensions
-field (npy_intp *shape "dimensions"
instead of simply npy_intp *dimensions
) of the underlying C-stuct. It is a trick, so one can write
shape
dimensions
npy_intp *shape "dimensions"
npy_intp *dimensions
mat.shape[0]
and it has the looks (and to some degree the feel) as if numpy's python-property shape
is called. But in reality a shortcut directly to the underlying C-stuct is taken.
shape
Btw calling python-shape
is quite costly: a tuple must be created and filled with values from dimensions
, then the 0-th element is accessed. On the other hand, Cython's way of doing it is much cheaper - just access the right element.
shape
dimensions
However, if you yet want access the python-property of the array, you have to cast it to a normal python-object (i.e. forget that this is a ndarray
) and then shape
is resolved to the tuple-property call via the usual Python-mechanism.
ndarray
shape
So basically, even if this is convenient, you don't want to access the dimensions of the numpy array in a tight loop the way it is done in the pandas-code, instead you would do the more verbose variant for performance:
...
N=mat.shape[0]
K=mat.shape[1]
...
Why you can write object[cnp.float64_t]
or similar in the function signature strikes me as strange - the parameter is then obviously interpreted as a simple object. Maybe this is just a bug.
object[cnp.float64_t]
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I don't know Cpython, but at least in
c
, ifarr.shape
returns a pointer (it seems to), you (and numpy) have no method to know it's dimension.– apple apple
Aug 8 at 2:06