#37797 is adding official dunder methods to expose the Arrow C Data/Stream Interface in Python using PyCapsules (#34031 / #35531).
In addition to official dunders to expose this to other libraries, we also need public APIs in pyarrow to import / consume such PyCapsules (or rather the objects implementing the dunders to give you the PyCapsule).
#37797 already added this to the pa.array(..), pa.record_batch(..) and pa.schema(..) constructors, such that you can for example create a pyarrow array with pa.array(obj) given any object obj that supports the interface by defining __arrow_c_array__.
But that's not fully complete: we certainly need a way to construct a RecordBatchReader as well, where we don't have such a factory function available. For this, we could add a from_ function (similar to the existing from_batches) like RecordBatchReader.from_stream?
(in addition there is also the Table, Field and DataType constructors, both those all have factory functions that could support this, similar to pa.array(..) et al)
Secondly, I am also wondering if we want to provide APIs that accept PyCapsules directly, instead of an object that implements the dunders. For example, if you are a library that has data in Arrow compatible memory, and you want to convert this to pyarrow through the C Data Interface, you might want to use a PyCapsule directly if your library doesn't expose a Python class that represents that data (to avoid that you need to create a small wrapper class just with the dunder to pass to the pyarrow constructor, although this is of course not difficult).
#37797 is adding official dunder methods to expose the Arrow C Data/Stream Interface in Python using PyCapsules (#34031 / #35531).
In addition to official dunders to expose this to other libraries, we also need public APIs in pyarrow to import / consume such PyCapsules (or rather the objects implementing the dunders to give you the PyCapsule).
#37797 already added this to the
pa.array(..),pa.record_batch(..)andpa.schema(..)constructors, such that you can for example create a pyarrow array withpa.array(obj)given any objectobjthat supports the interface by defining__arrow_c_array__.But that's not fully complete: we certainly need a way to construct a
RecordBatchReaderas well, where we don't have such a factory function available. For this, we could add afrom_function (similar to the existingfrom_batches) likeRecordBatchReader.from_stream?(in addition there is also the Table, Field and DataType constructors, both those all have factory functions that could support this, similar to
pa.array(..)et al)Secondly, I am also wondering if we want to provide APIs that accept PyCapsules directly, instead of an object that implements the dunders. For example, if you are a library that has data in Arrow compatible memory, and you want to convert this to pyarrow through the C Data Interface, you might want to use a PyCapsule directly if your library doesn't expose a Python class that represents that data (to avoid that you need to create a small wrapper class just with the dunder to pass to the pyarrow constructor, although this is of course not difficult).