Cython: Why do NumPy arrays need to be type-cast to object?

Multi tool use

Clash Royale CLAN TAG#URR8PPP

Cython: Why do NumPy arrays need to be type-cast to object?

I have seen something like this a few times in the Pandas source:

def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None): # ... N, K = (<object> mat).shape

This implies that a NumPy ndarray called mat is type-casted to a Python object.^*

ndarray

mat

Upon further inspection, it seems like this is used because a compile error arises if it isn't. My question is: why is this type-cast required in the first place?

Here are a few examples. This answer suggest simply that tuple packing doesn't work in Cython like it does in Python---but it doesn't seem to be a tuple unpacking issue. (It is a fine answer regardless, and I don't mean to pick on it.)

Take the following script, shape.pyx. It will fail at compile time with "Cannot convert 'npy_intp *' to Python object."

shape.pyx

from cython cimport Py_ssize_t import numpy as np from numpy cimport ndarray, float64_t cimport numpy as cnp cnp.import_array() def test_castobj(ndarray[float64_t, ndim=2] arr): cdef: Py_ssize_t b1, b2 # Tuple unpacking - this will fail at compile b1, b2 = arr.shape return b1, b2

But again, the issue does not seem to be tuple unpacking, per se. This will fail with the same error.

def test_castobj(ndarray[float64_t, ndim=2] arr): cdef: # Py_ssize_t b1, b2 ndarray[float64_t, ndim=2] zeros zeros = np.zeros(arr.shape, dtype=np.float64) return zeros

Seemingly, no tuple unpacking is happening here. A tuple is the first arg to np.zeros.

np.zeros

def test_castobj(ndarray[float64_t, ndim=2] arr): """This works""" cdef: Py_ssize_t b1, b2 ndarray[float64_t, ndim=2] zeros b1, b2 = (<object> arr).shape zeros = np.zeros((<object> arr).shape, dtype=np.float64) return b1, b2, zeros

This also works (perhaps the most confusing of all):

def test_castobj(object[float64_t, ndim=2] arr): cdef: tuple shape = arr.shape ndarray[float64_t, ndim=2] zeros zeros = np.zeros(shape, dtype=np.float64) return zeros

Example:

>>> from shape import test_castobj >>> arr = np.arange(6, dtype=np.float64).reshape(2, 3) >>> test_castobj(arr) (2, 3, array([[0., 0., 0.], [0., 0., 0.]]))

_{*Perhaps it has something to do with arr being a memoryview? But that's a shot in the dark.}

arr

Another example is in the Cython docs:

cpdef int sum3d(int[:, :, :] arr) nogil: cdef size_t i, j, k cdef int total = 0 I = arr.shape[0] J = arr.shape[1] K = arr.shape[2]

In this case, simply indexing arr.shape[i] prevents the error, which I find strange.

arr.shape[i]

This also works:

def test_castobj(object[float64_t, ndim=2] arr): cdef ndarray[float64_t, ndim=2] zeros zeros = np.zeros(arr.shape, dtype=np.float64) return zeros

I don't know Cpython, but at least in c, if arr.shape returns a pointer (it seems to), you (and numpy) have no method to know it's dimension.
– apple apple
Aug 8 at 2:06

c

arr.shape

1 Answer
1

You are right, it has nothing to do with the tuple-unpacking under Cython.

The reason is, that cnp.ndarray isn't an usual numpy-array (that means a numpy-array with interface known from python), but a Cython wrapper of the numpy's C-implementation for PyArrayObject (which is known as np.array in Python):

cnp.ndarray

PyArrayObject

np.array

ctypedef class numpy.ndarray [object PyArrayObject]: cdef __cythonbufferdefaults__ = "mode": "strided" cdef: # Only taking a few of the most commonly used and stable fields. # One should use PyArray_* macros instead to access the C fields. char *data int ndim "nd" npy_intp *shape "dimensions" npy_intp *strides dtype descr PyObject* base

shape maps in reality to dimensions-field (npy_intp *shape "dimensions" instead of simply npy_intp *dimensions) of the underlying C-stuct. It is a trick, so one can write

shape

dimensions

npy_intp *shape "dimensions"

npy_intp *dimensions

mat.shape[0]

and it has the looks (and to some degree the feel) as if numpy's python-property shape is called. But in reality a shortcut directly to the underlying C-stuct is taken.

shape

Btw calling python-shape is quite costly: a tuple must be created and filled with values from dimensions, then the 0-th element is accessed. On the other hand, Cython's way of doing it is much cheaper - just access the right element.

shape

dimensions

However, if you yet want access the python-property of the array, you have to cast it to a normal python-object (i.e. forget that this is a ndarray) and then shape is resolved to the tuple-property call via the usual Python-mechanism.

ndarray

shape

So basically, even if this is convenient, you don't want to access the dimensions of the numpy array in a tight loop the way it is done in the pandas-code, instead you would do the more verbose variant for performance:

... N=mat.shape[0] K=mat.shape[1] ...

Why you can write object[cnp.float64_t] or similar in the function signature strikes me as strange - the parameter is then obviously interpreted as a simple object. Maybe this is just a bug.

object[cnp.float64_t]

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Am1fPsGmIy2r8AJ8hxl0XR,el,T JI4Q V

搜尋此網誌

Sfyjdyy