Intermediate result blocked approach to aggregation memory management#15591
Intermediate result blocked approach to aggregation memory management#15591Rachelint wants to merge 86 commits intoapache:mainfrom
Conversation
|
Hi @Rachelint I think I have a alternative proposal that seems relatively easy to implement. |
Really thanks. This design in pr indeed still introduces quite a few code changes... I tried to not modify anythings about
But I found this way will introduce too many extra cost... Maybe we place the |
cc37eba to
f690940
Compare
95c6a36 to
a4c6f42
Compare
2100a5b to
0ee951c
Compare
|
Has finished development(and test) of all needed common structs!
|
c51d409 to
2863809
Compare
|
It is very close, just need to add more tests! |
31d660d to
2b8dd1e
Compare
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (a84d71c) to bbf67d9 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
Some build problems introduced by count distinct pr, fixing. |
|
I think it may be ready now. |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (a818cd8) to 22bb4e6 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (a818cd8) to 22bb4e6 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (a818cd8) to 22bb4e6 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
@Dandandan hi, can help trigger the benchmark again? |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (47b6976) to 8f033e4 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (47b6976) to 8f033e4 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing intermeidate-result-blocked-approach (47b6976) to 8f033e4 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Refactoring for zero cost when disabling the blocked approach |
| if let Some(emit_to) = self.group_ordering.oom_emit_to(n) | ||
| && let Some(batch) = self.emit(emit_to, false)? | ||
| { | ||
| return Ok(Some(ExecutionState::ProducingOutput(batch))); |
There was a problem hiding this comment.
should there also be a transition to ProducingBlocks here instead of ProducingOutput if enable_blocked_groups is set?
There was a problem hiding this comment.
Could be outside of this PR's scope, but another question I have is why isn't there a transition to ProducingBlocks as soon as we have a batch ready. Instead, all the input is consumed first before producing the first batch (assuming there's no group ordering)
Which issue does this PR close?
Rationale for this change
As mentioned in #7065 , we use a single
Vecto manageaggregation intermediate resultsboth inGroupAccumulatorandGroupValues.It is simple but not efficient enough in high-cardinality aggregation, because when
Vecis not large enough, we need to allocate a newVecand copy all data from the old one.So this pr introduces a
blocked approachto manage theaggregation intermediate results. We will never resize theVecin the approach, and instead we split the data to blocks, when the capacity is not enough, we just allocate a new block. Detail can see #7065What changes are included in this PR?
PrimitiveGroupsAccumulatorandGroupValuesPrimitiveas the exampleAre these changes tested?
Test by exist tests. And new unit tests, new fuzzy tests.
Are there any user-facing changes?
Two functions are added to
GroupValuesandGroupAccumulatortrait.But as you can see, there are default implementations for them, and users can choose to really support the blocked approach when wanting a better performance for their
udafs.