Skip to content

Add append_value_n to GenericByteBuilder#9426

Merged
scovich merged 7 commits intoapache:mainfrom
Fokko:fd-append
Mar 2, 2026
Merged

Add append_value_n to GenericByteBuilder#9426
scovich merged 7 commits intoapache:mainfrom
Fokko:fd-append

Conversation

@Fokko
Copy link
Copy Markdown
Contributor

@Fokko Fokko commented Feb 18, 2026

Which issue does this PR close?

Rationale for this change

I noticed that this method is available on PrimitiveTypeBuilder, but missing on the GenericByteBuilder, which make sense since the gain is less, but after benchmarking, it shows a solid 10%. Mostly because the more efficient allocation of the null-mask.

┌───────────────────┬────────────────┬───────────────────┬─────────┐
│     Benchmark     │ append_value_n │ append_value loop │ Speedup │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100/len=5       │ 371 ns         │ 408 ns            │ 10%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100/len=30      │ 456 ns         │ 507 ns            │ 10%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100/len=1024    │ 1.81 µs        │ 1.95 µs           │ 8%      │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=1000/len=5      │ 2.39 µs        │ 2.87 µs           │ 17%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=1000/len=30     │ 3.41 µs        │ 3.89 µs           │ 12%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=1000/len=1024   │ 12.3 µs        │ 14.4 µs           │ 15%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=10000/len=5     │ 23.8 µs        │ 29.3 µs           │ 19%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=10000/len=30    │ 33.7 µs        │ 39.0 µs           │ 14%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=10000/len=1024  │ 115.9 µs       │ 135.0 µs          │ 14%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100000/len=5    │ 227.5 µs       │ 278.6 µs          │ 18%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100000/len=30   │ 328.1 µs       │ 377.9 µs          │ 13%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100000/len=1024 │ 1.16 ms        │ 1.34 ms           │ 14%     │
└───────────────────┴────────────────┴───────────────────┴─────────┘

I think this is still worthwhile to be added. Let me know what the community thinks!

What changes are included in this PR?

A new public API.

Are these changes tested?

Yes!

Are there any user-facing changes?

A new public API.

I noticed that this method is available on PrimitiveTypeBuilder,
but missing on the GenericByteBuilder, which make sense since the
gain is less, but after benchmarking, it shows a solid 10%:

```
┌───────────────────┬────────────────┬───────────────────┬─────────┐
│     Benchmark     │ append_value_n │ append_value loop │ Speedup │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100/len=5       │ 371 ns         │ 408 ns            │ 10%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100/len=30      │ 456 ns         │ 507 ns            │ 10%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100/len=1024    │ 1.81 µs        │ 1.95 µs           │ 8%      │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=1000/len=5      │ 2.39 µs        │ 2.87 µs           │ 17%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=1000/len=30     │ 3.41 µs        │ 3.89 µs           │ 12%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=1000/len=1024   │ 12.3 µs        │ 14.4 µs           │ 15%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=10000/len=5     │ 23.8 µs        │ 29.3 µs           │ 19%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=10000/len=30    │ 33.7 µs        │ 39.0 µs           │ 14%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=10000/len=1024  │ 115.9 µs       │ 135.0 µs          │ 14%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100000/len=5    │ 227.5 µs       │ 278.6 µs          │ 18%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100000/len=30   │ 328.1 µs       │ 377.9 µs          │ 13%     │
├───────────────────┼────────────────┼───────────────────┼─────────┤
│ n=100000/len=1024 │ 1.16 ms        │ 1.34 ms           │ 14%     │
└───────────────────┴────────────────┴───────────────────┴─────────┘
```

I think this is still worthwhile to be added. Let me know what the
community thinks!
pub fn append_value_n(&mut self, value: impl AsRef<T::Native>, n: usize) {
let bytes = value.as_ref().as_ref();
for _ in 0..n {
self.value_builder.extend_from_slice(bytes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there methods can we can use to reserve the capacity for value_builder and offsets_builder ahead of time

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(reserve is potentially dangerous performance wise if it doesn't do amortized allocations, or we would need to do the amortization here.).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that directly related to this PR? Or a more general observation?

AFAIK, rust container allocations are amortized, e.g. Vec docs state:

Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.

So hopefully that's enough to avoid quadratic silliness even if we don't reserve, leaving only constant factors on the table?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look closely at Rust, but from c++ there could be a potential footgun (copied the reference below). So calling append_n with n=1 in a loop was my concern if we ended up using reserve. It sounds like Rust might might make stronger guarantees (but there is wiggle room)

Correctly using reserve() can prevent unnecessary reallocations, but inappropriate uses of reserve() (for instance, calling it before every push_back() call) may actually increase the number of reallocations (by causing the capacity to grow linearly rather than exponentially) and result in increased computational complexity and decreased performance. For example, a function that receives an arbitrary vector by reference and appends elements to it should usually not call reserve() on the vector, since it does not know of the vector's usage characteristics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or put another way, callers that care about it can probably call reserve themselves externally

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reserve capacity by calling with_capacity:

/// Creates a new [`GenericByteBuilder`].
///
/// - `item_capacity` is the number of items to pre-allocate.
/// The size of the preallocated buffer of offsets is the number of items plus one.
/// - `data_capacity` is the total number of bytes of data to pre-allocate
/// (for all items, not per item).
pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
let mut offsets_builder = Vec::with_capacity(item_capacity + 1);
offsets_builder.push(T::Offset::from_usize(0).unwrap());
Self {
value_builder: Vec::with_capacity(data_capacity),
offsets_builder,
null_buffer_builder: NullBufferBuilder::new(item_capacity),
}
}

We already do this in our code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so lets avoid adding it here. It seems better placed in caller code.

Copy link
Copy Markdown
Contributor

@Dandandan Dandandan Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is append_value_n I believe it makes sense to add a reserve.
While the builder provides with_capacity - the primary use case for using a builder is when you don't know the capacity upfront (in most other cases you can build the array from iterator or slice directly and skipping the builder overhead).

Copy link
Copy Markdown
Contributor Author

@Fokko Fokko Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My friend Claude created a benchmark:

// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements.  See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership.  The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License.  You may obtain a copy of the License at
//
//   http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied.  See the License for the
// specific language governing permissions and limitations
// under the License.

use arrow_array::builder::StringBuilder;
use criterion::*;
use std::hint;

fn bench_append_value(c: &mut Criterion) {
    let mut group = c.benchmark_group("append_value");

    for &str_len in &[5, 30, 1024] {
        let value = "x".repeat(str_len);

        for &n in &[100, 1000, 10000, 100000] {
            group.throughput(Throughput::Elements(n as u64));
            group.bench_with_input(
                BenchmarkId::new(format!("n={n}"), format!("len={str_len}")),
                &(&value, n),
                |b, &(value, n)| {
                    b.iter(|| {
                        let mut builder = StringBuilder::new();
                        for _ in 0..n {
                            builder.append_value(value);
                        }
                        hint::black_box(builder.finish());
                    })
                },
            );
        }
    }

    group.finish();
}

fn bench_append_value_n(c: &mut Criterion) {
    let mut group = c.benchmark_group("append_value_n");

    for &str_len in &[5, 30, 1024] {
        let value = "x".repeat(str_len);

        for &n in &[100, 1000, 10000, 100000] {
            group.throughput(Throughput::Elements(n as u64));
            group.bench_with_input(
                BenchmarkId::new(format!("n={n}"), format!("len={str_len}")),
                &(&value, n),
                |b, &(value, n)| {
                    b.iter(|| {
                        let mut builder = StringBuilder::new();
                        builder.append_value_n(value, n);
                        hint::black_box(builder.finish());
                    })
                },
            );
        }
    }

    group.finish();
}

criterion_group!(benches, bench_append_value, bench_append_value_n);
criterion_main!(benches);

Let me know if you find this valuable to add to the repository.

This resulted in:

append_value_n improvements with reserve():

┌────────────────────┬─────────┬─────────┬─────────────┐
│       Config       │ Before  │  After  │ Improvement │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=1000, len=5      │ 2.96 µs │ 2.68 µs │ ~9%         │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=10000, len=5     │ 29.3 µs │ 26.4 µs │ ~10%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=100000, len=5    │ 268 µs  │ 249 µs  │ ~7%         │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=100, len=30      │ 628 ns  │ 548 ns  │ ~13%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=1000, len=30     │ 4.22 µs │ 3.68 µs │ ~13%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=10000, len=30    │ 43.9 µs │ 36.3 µs │ ~17%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=100000, len=30   │ 404 µs  │ 350 µs  │ ~13%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=100, len=1024    │ 3.16 µs │ 1.93 µs │ ~39%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=1000, len=1024   │ 13.7 µs │ 11.1 µs │ ~19%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=10000, len=1024  │ 729 µs  │ 103 µs  │ ~86%        │
├────────────────────┼─────────┼─────────┼─────────────┤
│ n=100000, len=1024 │ 8.68 ms │ 1.02 ms │ ~88%        │
└────────────────────┴─────────┴─────────┴─────────────┘

Keep in mind that the benchmark uses ::new rather than ::with_capacity. We can see that .reserve offers some benefits if you don't pre-allocate.

Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pub fn append_value_n(&mut self, value: impl AsRef<T::Native>, n: usize) {
let bytes = value.as_ref().as_ref();
for _ in 0..n {
self.value_builder.extend_from_slice(bytes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that directly related to this PR? Or a more general observation?

AFAIK, rust container allocations are amortized, e.g. Vec docs state:

Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.

So hopefully that's enough to avoid quadratic silliness even if we don't reserve, leaving only constant factors on the table?

@scovich
Copy link
Copy Markdown
Contributor

scovich commented Feb 20, 2026

Out of curiosity, are there any benchmarks in the repo that show these improvements?

@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Feb 20, 2026

Out of curiosity, are there any benchmarks in the repo that show these improvements?

I've asked by friend Claude to generate a microbenchmark, where the result can be found in the PR description. I can commit the benchmark as well, but I figured that would generate a lot of very specific benchmarks which won't be used that much, I guess.

self.value_builder.extend_from_slice(bytes);
self.offsets_builder.push(self.next_offset());
}
self.null_buffer_builder.append_n_non_nulls(n);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling this out of the loop gives the 10-20% speedup.

let bytes = value.as_ref().as_ref();
for _ in 0..n {
self.value_builder.extend_from_slice(bytes);
self.offsets_builder.push(self.next_offset());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make it faster:
self.offsets_builder.append_trusted_len_iter( could be used outside of the loop with an offset iterator.

IMO, also a reserve should be added here as this is a builder API - it makes not really sense in most cases to use a builder if you know the capacity upfront (you can build the array directly, which will be faster anyway.

@scovich
Copy link
Copy Markdown
Contributor

scovich commented Feb 27, 2026

@Dandandan -- any reason not to merge this now that the reserve calls were added?

@scovich scovich merged commit 01d34a8 into apache:main Mar 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add append_value_n to GenericByteBuilder

4 participants