feat: Optimize from_bitwise_binary_op with 64-bit alignment#9441
Conversation
Signed-off-by: Kunal Singh Dadhwal <kunalsinghdadhwal@gmail.com>
|
@Dandandan kindly review |
|
run benchmark boolean_kernels |
and time: [129.08 ns 129.76 ns 130.46 ns]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
or time: [134.48 ns 135.29 ns 136.17 ns]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
not time: [91.808 ns 92.431 ns 93.130 ns]
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
and_sliced_1 time: [596.55 ns 600.04 ns 604.23 ns]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
or_sliced_1 time: [599.21 ns 601.99 ns 604.87 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
not_sliced_1 time: [90.421 ns 90.955 ns 91.544 ns]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
and_sliced_24 time: [116.06 ns 116.83 ns 117.75 ns]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
or_sliced_24 time: [116.09 ns 116.94 ns 117.91 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
not_slice_24 time: [90.518 ns 91.550 ns 92.754 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severehere is the comparsion
|
|
kindly review @Dandandan |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
kindly review and merge @Dandandan |
|
run benchmark boolean_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
|
Looks like a solid performance improvement. I will review this shortly |
alamb
left a comment
There was a problem hiding this comment.
Thank you very much @kunalsinghdadhwal
I went through this code carefully and it makes sense. I also spent quite a while ensuring the coverage is good and the comments make sense
I believe the change to the offset invariants should be treated as an API change and thus we should wait for the next major release
| /// * `op` may be called with input bits outside the requested range. | ||
| /// * The returned `BooleanBuffer` always has zero offset. | ||
| /// * Returned `BooleanBuffer` may have non zero offset | ||
| /// * Returned `BooleanBuffer` may have bits set outside the requested range |
There was a problem hiding this comment.
this may be treated as an API change 🤔
|
I took the liberty of pushing commits to this PR |
|
FYI @jhorstmann and @Dandandan you may be interested in this PR |
|
Thanks @alamb for reviewing this waiting for the next release |
|
I would have thought and/or 24 to improve more, perhaps it's still generating suboptimal code for those... |
|
Well, since it went in to main it will be part of 58.1.0. I'll test in DataFusion to make sure |
Which issue does this PR close?
from_bitwise_binary_op#9378Rationale for this change
the optimizations as listed in the issue description
What changes are included in this PR?
When both inputs share the same sub-64-bit alignment (left_offset % 64 == right_offset % 64), the optimized path is used. This covers the common cases (both offset 0, both sliced equally, etc.). The BitChunks fallback is retained only when the two offsets have different sub-64-bit alignment.
Are these changes tested?
Yes the tests are changed and they are included
Are there any user-facing changes?
Yes, this is a minor breaking change to from_bitwise_binary_op: