I am dumping @gigony's offline rants as trackable issues, since I ran into several of them a few times myself.
Today we say that "there's no object model in CUDA Python" and that "cuda.py (#70) comes to rescue". But technically it's not true. We do have low-level object wrappers such as cudaStream_t and CUcontext already. They are full-fledged Python objects, in the sense that the type bindings are implemented as Cython cdef classs. So what we really meant is that they are only meant to be used in a non-pythonic fashion, together with the low-level bindings to the C APIs.
One of the challenges with the low-level type bindings is getPtr() (example), which is a method that all low-level objects have. However, they do not behave as users would expect! For example, to get the underlying cudaStream_t pointer, one should call int(cudaStream_t) instead of cudaStream_t.getPtr(). I believe this was designed/implemented in order to shorten the time to prepare a kernel launch buffer, as instead of returning the pointer address to the actual struct/pointer getPtr() returns the pointer address to the pointer (to the struct/pointer), which can be easily copied over as a kernel argument void**.
We should at least document it, and in a parallel universe find a way to deprecate it with a better name.
I am dumping @gigony's offline rants as trackable issues, since I ran into several of them a few times myself.
Today we say that "there's no object model in CUDA Python" and that "
cuda.py(#70) comes to rescue". But technically it's not true. We do have low-level object wrappers such ascudaStream_tandCUcontextalready. They are full-fledged Python objects, in the sense that the type bindings are implemented as Cythoncdef classs. So what we really meant is that they are only meant to be used in a non-pythonic fashion, together with the low-level bindings to the C APIs.One of the challenges with the low-level type bindings is
getPtr()(example), which is a method that all low-level objects have. However, they do not behave as users would expect! For example, to get the underlyingcudaStream_tpointer, one should callint(cudaStream_t)instead ofcudaStream_t.getPtr(). I believe this was designed/implemented in order to shorten the time to prepare a kernel launch buffer, as instead of returning the pointer address to the actual struct/pointergetPtr()returns the pointer address to the pointer (to the struct/pointer), which can be easily copied over as a kernel argumentvoid**.We should at least document it, and in a parallel universe find a way to deprecate it with a better name.