Skip to content

[Parquet] Writing in 57.0.0 seems 10% slower than 56.0.0 #8783

@alamb

Description

@alamb

Describe the bug
While testing the tpchgen-rs upgrade to arrow 57 in

@clflushopt, @kevinjqliu and I found that arrow-rs 57 seems to write data around 10% slower than arrow 56: clflushopt/tpchgen-rs#200 (review)

Specifically running this command is around 10% slower

tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet

56.0.0 takes 0m27.122s
57.0.0 takes 0m28.776s

To Reproduce

rm -rf lineitem && cargo build --release && time ./target/release/tpchgen-cli --scale-factor=100 --tables=lineitem --parts=10 --format=parquet

Expected behavior

57 should be the same or better performance as 56

Additional context
I am doing some git bisecting to see if I can find some more data

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugnext-major-releasethe PR has API changes and it waiting on the next major versionparquetChanges to the parquet crateperformance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions