This is related to
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is common to apply successive filtering operations that progressively narrow a bitmap with rows that pass a predicate -- for example a AND b AND c AND d AND ...
This would be evaluated typically like
t1 = a AND b
t2 = t1 AND c
t3 = t2 AND d
Where t1, t2, and t3 are newly allocated arrays; t3 holds the final output
For a typical array size of 8192 elements, t1, t2 and t3 are eack 1KB allocations
The current boolean kernels in arrow pretty much require allocating a new buffer as they take a reference
Now that @rluvaton has added optimized code to apply mutable bitwise operations in the following PR
I think we are in the position to also provide the same operations on BooleanArray
Describe the solution you'd like
I would like a way to update the contents of a BooleanArray in place (assuming there is no other reference to the backing array)
Describe alternatives you've considered
I propose following the API of PrimitiveArray, such as PrimitiveArray::unary and PrimitiveArray::unary_mut
So that would mean adding new functions to BooleanArray, like
impl BooleanArray {
...
// apply a bitwise operation to this BooleanArray, using u64 operations
// returning a new BooleanArray
fn bitwise_unary(&self, mut op:F) -> Self
where
F: FnMut(u64) -> u64 {.. }
// Try to apply a bitwise operation to this BooleanArray in place, using u64 operations
// If the underlying buffer is shared, returns Err<self>
fn bitwise_unary_mut(self, mut op:F) -> Result<Self, Self>
where
F: FnMut(u64) -> u64 {.. }
// Try to apply a bitwise operation to this BooleanArray in place, using u64 operations
// If the underlying buffer is shared, returns a new
// BooleanArray
fn bitwise_unary_mut_or_clone(self, mut op:F) -> Self
where
F: FnMut(u64) -> u64 {.. }
And we would also add similar functions for binary_bitwise functions (that takes a second array argument and right_offset_in_bits)
These APIs are likely fairly straightforward if we implement the plan in
As then the new methods on BooleanArray would just call the relevant underlying buffer methods
Additional context
This is related to
PrimitiveArray::unary_mut,PrimitiveArray:try_unary_mut, etc #8808Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is common to apply successive filtering operations that progressively narrow a bitmap with rows that pass a predicate -- for example
a AND b AND c AND d AND ...This would be evaluated typically like
Where
t1,t2, andt3are newly allocated arrays;t3holds the final outputFor a typical array size of 8192 elements,
t1,t2andt3are eack 1KB allocationsThe current boolean kernels in arrow pretty much require allocating a new buffer as they take a reference
Now that @rluvaton has added optimized code to apply mutable bitwise operations in the following PR
apply_unary_opandapply_binary_opbitwise operations #8619I think we are in the position to also provide the same operations on BooleanArray
Describe the solution you'd like
I would like a way to update the contents of a
BooleanArrayin place (assuming there is no other reference to the backing array)Describe alternatives you've considered
I propose following the API of
PrimitiveArray, such asPrimitiveArray::unaryandPrimitiveArray::unary_mutSo that would mean adding new functions to
BooleanArray, likeAnd we would also add similar functions for binary_bitwise functions (that takes a second array argument and right_offset_in_bits)
These APIs are likely fairly straightforward if we implement the plan in
As then the new methods on
BooleanArraywould just call the relevant underlyingbuffermethodsAdditional context