Skip to content

feat: Optimize from_bitwise_binary_op with 64-bit alignment#9441

Merged
Dandandan merged 6 commits intoapache:mainfrom
kunalsinghdadhwal:kunal/optimize-bitwise-binary-op-9378
Mar 19, 2026
Merged

feat: Optimize from_bitwise_binary_op with 64-bit alignment#9441
Dandandan merged 6 commits intoapache:mainfrom
kunalsinghdadhwal:kunal/optimize-bitwise-binary-op-9378

Conversation

@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

the optimizations as listed in the issue description

  • Align to 8 bytes
  • Don't try to return a buffer with bit_offset 0 but round it to a multiple of 64
  • Use chunk_exact for the fallback path

What changes are included in this PR?

When both inputs share the same sub-64-bit alignment (left_offset % 64 == right_offset % 64), the optimized path is used. This covers the common cases (both offset 0, both sliced equally, etc.). The BitChunks fallback is retained only when the two offsets have different sub-64-bit alignment.

Are these changes tested?

Yes the tests are changed and they are included

Are there any user-facing changes?

Yes, this is a minor breaking change to from_bitwise_binary_op:

  • The returned BooleanBuffer may now have a non-zero offset (previously always 0)
  • The returned BooleanBuffer may have padding bits set outside the logical range in values()

Signed-off-by: Kunal Singh Dadhwal <kunalsinghdadhwal@gmail.com>
@github-actions github-actions Bot added the arrow Changes to the arrow crate label Feb 19, 2026
@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor Author

@Dandandan kindly review

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmark boolean_kernels

@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor Author

kunalsinghdadhwal commented Feb 19, 2026

and                     time:   [129.08 ns 129.76 ns 130.46 ns]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

or                      time:   [134.48 ns 135.29 ns 136.17 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

not                     time:   [91.808 ns 92.431 ns 93.130 ns]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

and_sliced_1            time:   [596.55 ns 600.04 ns 604.23 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

or_sliced_1             time:   [599.21 ns 601.99 ns 604.87 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

not_sliced_1            time:   [90.421 ns 90.955 ns 91.544 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

and_sliced_24           time:   [116.06 ns 116.83 ns 117.75 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

or_sliced_24            time:   [116.09 ns 116.94 ns 117.91 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

not_slice_24            time:   [90.518 ns 91.550 ns 92.754 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

here is the comparsion

Benchmark main optimized speedup
and 128.33 ns 130.22 ns 0.98x
or 132.71 ns 134.03 ns 0.99x
not 91.78 ns 91.78 ns 1.00x
and_sliced_1 656.07 ns 650.42 ns 1.01x
or_sliced_1 669.51 ns 662.51 ns 1.01x
not_sliced_1 114.27 ns 112.00 ns 1.02x
and_sliced_24 141.51 ns 139.42 ns 1.01x
or_sliced_24 138.28 ns 114.78 ns 1.20x
not_slice_24 90.24 ns 113.18 ns 0.80x

@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor Author

kunalsinghdadhwal commented Feb 20, 2026

@Dandandan @alamb

@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor Author

kindly review @Dandandan

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing kunal/optimize-bitwise-binary-op-9378 (ecf51b4) to ab9c062 diff
BENCH_NAME=boolean_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=kunal_optimize-bitwise-binary-op-9378
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖: Benchmark completed

Details

group            kunal_optimize-bitwise-binary-op-9378    main
-----            -------------------------------------    ----
and              1.02    214.1±2.60ns        ? ?/sec      1.00    210.8±1.34ns        ? ?/sec
and_sliced_1     1.00  1090.9±11.59ns        ? ?/sec      1.01   1096.5±5.50ns        ? ?/sec
and_sliced_24    1.00    225.4±0.42ns        ? ?/sec      1.09    246.4±0.83ns        ? ?/sec
not              1.00    144.1±1.59ns        ? ?/sec      1.00    144.4±0.17ns        ? ?/sec
not_slice_24     1.20    174.3±0.28ns        ? ?/sec      1.00    145.5±6.71ns        ? ?/sec
not_sliced_1     1.21    174.5±1.33ns        ? ?/sec      1.00    144.6±1.09ns        ? ?/sec
or               1.02    202.4±1.24ns        ? ?/sec      1.00    198.8±0.40ns        ? ?/sec
or_sliced_1      1.00  1094.9±10.78ns        ? ?/sec      1.01   1110.5±8.90ns        ? ?/sec
or_sliced_24     1.00    227.3±0.34ns        ? ?/sec      1.09    247.0±4.87ns        ? ?/sec

@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor Author

kindly review and merge @Dandandan

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

run benchmark boolean_kernels

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4082907033-418-r4mmn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing kunal/optimize-bitwise-binary-op-9378 (4c4f205) to 66313ae (merge-base) diff
BENCH_NAME=boolean_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_kernels
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group            kunal_optimize-bitwise-binary-op-9378    main
-----            -------------------------------------    ----
and              1.04    152.2±0.71ns        ? ?/sec      1.00    146.4±0.67ns        ? ?/sec
and_sliced_1     1.00    559.5±1.33ns        ? ?/sec      1.13    631.5±0.74ns        ? ?/sec
and_sliced_24    1.00    174.8±3.22ns        ? ?/sec      1.57    273.6±0.91ns        ? ?/sec
not              1.01    107.3±0.95ns        ? ?/sec      1.00    106.1±0.46ns        ? ?/sec
not_slice_24     1.16    123.1±0.66ns        ? ?/sec      1.00    106.1±0.45ns        ? ?/sec
not_sliced_1     1.17    123.7±0.33ns        ? ?/sec      1.00    105.9±0.43ns        ? ?/sec
or               1.04    151.3±0.87ns        ? ?/sec      1.00    146.0±0.75ns        ? ?/sec
or_sliced_1      1.00    596.0±1.23ns        ? ?/sec      1.01    601.7±0.73ns        ? ?/sec
or_sliced_24     1.00    174.5±3.29ns        ? ?/sec      1.58    275.6±0.85ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 88.8s
Peak memory 1.7 GiB
Avg memory 1.7 GiB
CPU user 87.8s
CPU sys 0.7s
Disk read 0 B
Disk write 596.1 MiB

branch

Metric Value
Wall time 90.8s
Peak memory 1.7 GiB
Avg memory 1.7 GiB
CPU user 90.7s
CPU sys 0.1s
Disk read 0 B
Disk write 1004.0 KiB

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

Looks like a solid performance improvement. I will review this shortly

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @kunalsinghdadhwal

I went through this code carefully and it makes sense. I also spent quite a while ensuring the coverage is good and the comments make sense

I believe the change to the offset invariants should be treated as an API change and thus we should wait for the next major release

/// * `op` may be called with input bits outside the requested range.
/// * The returned `BooleanBuffer` always has zero offset.
/// * Returned `BooleanBuffer` may have non zero offset
/// * Returned `BooleanBuffer` may have bits set outside the requested range
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be treated as an API change 🤔

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

I took the liberty of pushing commits to this PR

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

FYI @jhorstmann and @Dandandan you may be interested in this PR

@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Mar 18, 2026
@kunalsinghdadhwal
Copy link
Copy Markdown
Contributor Author

Thanks @alamb for reviewing this waiting for the next release

@Dandandan
Copy link
Copy Markdown
Contributor

I would have thought and/or 24 to improve more, perhaps it's still generating suboptimal code for those...

@Dandandan Dandandan merged commit d53df60 into apache:main Mar 19, 2026
26 checks passed
@alamb alamb removed the next-major-release the PR has API changes and it waiting on the next major version label Mar 20, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 20, 2026

Well, since it went in to main it will be part of 58.1.0. I'll test in DataFusion to make sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize from_bitwise_binary_op

5 participants