Skip to content

[pick](branch-4.0)[fix](inverted index) Make select_best_reader deterministic for multi-index columns #61596#61692

Merged
yiguolei merged 2 commits intoapache:branch-4.0from
airborne12:pick/branch-4.0/61596
Mar 25, 2026
Merged

[pick](branch-4.0)[fix](inverted index) Make select_best_reader deterministic for multi-index columns #61596#61692
yiguolei merged 2 commits intoapache:branch-4.0from
airborne12:pick/branch-4.0/61596

Conversation

@airborne12
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #61596

Problem Summary:

Pick #61596 to branch-4.0.

When a column has multiple inverted indexes (e.g., different analyzers), select_best_reader could return different readers depending on std::unordered_map iteration order, causing non-deterministic behavior. This fix sorts candidate readers by index_id to ensure deterministic selection.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…-index columns (apache#61596)

### What problem does this PR solve?

Issue Number: close #DORIS-24685

Related PR: N/A

Problem Summary:

When multiple inverted indexes with different analyzers exist on the
same column, `select_best_reader()` returns the first matching candidate
based on iteration order of `_reader_entries`. Since `_reader_entries`
ordering depends on the rowset schema's index ordering, and different
segments can have different orderings (e.g. after sequential `BUILD
INDEX` operations), the same query selects different indexes for
different segments, producing inconsistent results.

Fix: Replace all order-dependent candidate selection in
`select_for_text()`, `select_for_numeric()`, and `select_best_reader()`
with deterministic selection by smallest `index_id` via
`pick_preferred()` and `pick_smallest_index_id()` helpers. This ensures
consistent index selection regardless of schema ordering across
segments.
@airborne12 airborne12 requested a review from yiguolei as a code owner March 25, 2026 02:55
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 25, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@airborne12
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 67.65% (23/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.97% (19221/36284)
Line Coverage 36.13% (179061/495535)
Region Coverage 32.77% (138857/423731)
Branch Coverage 33.73% (60312/178794)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 70.59% (24/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 69.83% (24806/35521)
Line Coverage 52.34% (258895/494652)
Region Coverage 49.90% (213555/428007)
Branch Coverage 51.16% (91795/179437)

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Mar 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 70.59% (24/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 69.83% (24806/35521)
Line Coverage 52.34% (258895/494652)
Region Coverage 49.90% (213555/428007)
Branch Coverage 51.16% (91795/179437)

@yiguolei yiguolei merged commit 4b78331 into apache:branch-4.0 Mar 25, 2026
25 of 28 checks passed
@airborne12 airborne12 deleted the pick/branch-4.0/61596 branch April 2, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants