Cython: Why do NumPy arrays need to be type-cast to object?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Cython: Why do NumPy arrays need to be type-cast to object?



I have seen something like this a few times in the Pandas source:


def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None):
# ...
N, K = (<object> mat).shape



This implies that a NumPy ndarray called mat is type-casted to a Python object.*


ndarray


mat



Upon further inspection, it seems like this is used because a compile error arises if it isn't. My question is: why is this type-cast required in the first place?



Here are a few examples. This answer suggest simply that tuple packing doesn't work in Cython like it does in Python---but it doesn't seem to be a tuple unpacking issue. (It is a fine answer regardless, and I don't mean to pick on it.)



Take the following script, shape.pyx. It will fail at compile time with "Cannot convert 'npy_intp *' to Python object."


shape.pyx


from cython cimport Py_ssize_t
import numpy as np
from numpy cimport ndarray, float64_t
cimport numpy as cnp
cnp.import_array()

def test_castobj(ndarray[float64_t, ndim=2] arr):

cdef:
Py_ssize_t b1, b2

# Tuple unpacking - this will fail at compile
b1, b2 = arr.shape
return b1, b2



But again, the issue does not seem to be tuple unpacking, per se. This will fail with the same error.


def test_castobj(ndarray[float64_t, ndim=2] arr):

cdef:
# Py_ssize_t b1, b2
ndarray[float64_t, ndim=2] zeros

zeros = np.zeros(arr.shape, dtype=np.float64)
return zeros



Seemingly, no tuple unpacking is happening here. A tuple is the first arg to np.zeros.


np.zeros


def test_castobj(ndarray[float64_t, ndim=2] arr):
"""This works"""
cdef:
Py_ssize_t b1, b2
ndarray[float64_t, ndim=2] zeros

b1, b2 = (<object> arr).shape
zeros = np.zeros((<object> arr).shape, dtype=np.float64)
return b1, b2, zeros



This also works (perhaps the most confusing of all):


def test_castobj(object[float64_t, ndim=2] arr):
cdef:
tuple shape = arr.shape
ndarray[float64_t, ndim=2] zeros
zeros = np.zeros(shape, dtype=np.float64)
return zeros



Example:


>>> from shape import test_castobj
>>> arr = np.arange(6, dtype=np.float64).reshape(2, 3)

>>> test_castobj(arr)
(2, 3, array([[0., 0., 0.],
[0., 0., 0.]]))



*Perhaps it has something to do with arr being a memoryview? But that's a shot in the dark.


arr



Another example is in the Cython docs:


cpdef int sum3d(int[:, :, :] arr) nogil:
cdef size_t i, j, k
cdef int total = 0
I = arr.shape[0]
J = arr.shape[1]
K = arr.shape[2]



In this case, simply indexing arr.shape[i] prevents the error, which I find strange.


arr.shape[i]



This also works:


def test_castobj(object[float64_t, ndim=2] arr):
cdef ndarray[float64_t, ndim=2] zeros
zeros = np.zeros(arr.shape, dtype=np.float64)
return zeros





I don't know Cpython, but at least in c, if arr.shape returns a pointer (it seems to), you (and numpy) have no method to know it's dimension.
– apple apple
Aug 8 at 2:06



c


arr.shape




1 Answer
1



You are right, it has nothing to do with the tuple-unpacking under Cython.



The reason is, that cnp.ndarray isn't an usual numpy-array (that means a numpy-array with interface known from python), but a Cython wrapper of the numpy's C-implementation for PyArrayObject (which is known as np.array in Python):


cnp.ndarray


PyArrayObject


np.array


ctypedef class numpy.ndarray [object PyArrayObject]:
cdef __cythonbufferdefaults__ = "mode": "strided"

cdef:
# Only taking a few of the most commonly used and stable fields.
# One should use PyArray_* macros instead to access the C fields.
char *data
int ndim "nd"
npy_intp *shape "dimensions"
npy_intp *strides
dtype descr
PyObject* base



shape maps in reality to dimensions-field (npy_intp *shape "dimensions" instead of simply npy_intp *dimensions) of the underlying C-stuct. It is a trick, so one can write


shape


dimensions


npy_intp *shape "dimensions"


npy_intp *dimensions


mat.shape[0]



and it has the looks (and to some degree the feel) as if numpy's python-property shape is called. But in reality a shortcut directly to the underlying C-stuct is taken.


shape



Btw calling python-shape is quite costly: a tuple must be created and filled with values from dimensions, then the 0-th element is accessed. On the other hand, Cython's way of doing it is much cheaper - just access the right element.


shape


dimensions



However, if you yet want access the python-property of the array, you have to cast it to a normal python-object (i.e. forget that this is a ndarray) and then shape is resolved to the tuple-property call via the usual Python-mechanism.


ndarray


shape



So basically, even if this is convenient, you don't want to access the dimensions of the numpy array in a tight loop the way it is done in the pandas-code, instead you would do the more verbose variant for performance:


...
N=mat.shape[0]
K=mat.shape[1]
...



Why you can write object[cnp.float64_t] or similar in the function signature strikes me as strange - the parameter is then obviously interpreted as a simple object. Maybe this is just a bug.


object[cnp.float64_t]






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard