[feature](search) support MATCH projection as virtual column for inverted index evaluation#61092
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 27635 ms |
TPC-DS: Total hot run time: 153480 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 27766 ms |
TPC-DS: Total hot run time: 152928 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
zhiqiang-hhhh
left a comment
There was a problem hiding this comment.
/**
* MATCH expressions in FULL OUTER JOIN projections optimization:
*
* Before optimization (brute-force approach):
* - Execute FULL OUTER JOIN ON A.k1 = B.k1 first
* └── Get complete join result set (all rows)
* - Perform projections on each row of join result:
* ├── Projection 1: A.k1 (simple column read)
* └── Projection 2: A.content MATCH_ANY 'hello' (complex expression)
* └── Evaluate MATCH for every row (no index, brute-force computation)
*
* After optimization (virtual column approach):
* - Pre-compute MATCH at OlapScan layer as virtual column
* └── Leverage inverted index for fast evaluation
* └── Cache result in IndexExecContext
* - Execute FULL OUTER JOIN ON A.k1 = B.k1
* └── A side already has pre-computed virtual column result
* - Perform projections
* ├── Projection 1: A.k1
* └── Projection 2: Read cached virtual column result directly
*
* Key insight: JOIN semantics forbid filtering, but don't prevent
* pre-computing with index and caching results for downstream use.
*/
|
PR approved by anyone and no changes requested. |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
1824aa5 to
b3db5dc
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
b3db5dc to
eb6f1f8
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
…column projections Wire up virtual column MATCH expressions with inverted index evaluation in segment_iterator so that MATCH projections (pushed down via PreferPushDownProject) can leverage the fast index path instead of slow-path expression evaluation. Changes: - Match.java: add PreferPushDownProject interface - segment_iterator.cpp: set IndexExecContext on virtual column exprs, evaluate inverted index for virtual column MATCH, and convert result bitmaps to UInt8 columns for fast_execute() cache Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests MATCH expressions as projections (not filters) pushed down as virtual columns on OlapScan, evaluated via inverted index. Covers: - Simple MATCH projection - MATCH projection with FULL OUTER JOIN - Multiple MATCH projections - MATCH projection with additional filter - MATCH_PHRASE projection - Regression check that MATCH filter still works - MATCH filter with INNER JOIN - EXPLAIN output verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n projections The critical bug: bitmap→column conversion in _output_index_result_column was only called from _process_common_expr, gated by _is_need_expr_eval. In FULL OUTER JOIN + projection-only (no WHERE), _is_need_expr_eval is false, so conversion never ran and fast_execute() fell back to slow path. Fix: refactor _output_index_result_column_for_expr into a generic _output_index_result_column(vector<VExprContext*>, ...) and call it in step5 of _next_batch_internal for virtual column exprs, before _materialization_of_virtual_column, independent of _is_need_expr_eval. Also: - FE: add Project→Filter→OlapScan pattern to the rewrite rule - FE: add unit tests for PushDownMatchProjectionAsVirtualColumn - Regression: enhance tests with no-index, MOW UNIQUE, compound MATCH, and direct filter edge cases (13 total test cases) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rojection 1. Add null check for getTableProperty() to prevent NPE when table property is not set. 2. Remove the SlotReference child check — Match expressions can have non-SlotReference children and should still be pushed down. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ite rules 1. BE: Clone virtual column exprs before set_index_context() to avoid cross-segment context corruption. IndexExecContext holds segment-specific index iterator references that would be overwritten on shared VExprContext. 2. FE: Add appendVirtualColumns/appendVirtualColumnsAndTopN to LogicalOlapScan. Multiple rewrite rules (CSE, MATCH, Score, Vector) can now coexist by appending virtual columns instead of replacing. Remove virtualColumns.isEmpty() guard from PushDownMatchProjectionAsVirtualColumn. 3. Tests: Strengthen PushDownMatchProjectionAsVirtualColumnTest (3→6 tests) with fine-grained assertions: alias name preservation, slot replacement correctness, duplicate MATCH dedup, and append-to-existing verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
score() pushed by PushDownScoreTopNIntoOlapScan has no children but reaches evaluate_inverted_index(). Replace DCHECK_GE with graceful early return to avoid crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r handling in _apply_index_expr Add unit tests covering error handling branches for virtual column MATCH expression evaluation in SegmentIterator::_apply_index_expr(). Tests cover: - evaluate_inverted_index returning OK (happy path) - null index context being skipped - downgrade errors (BYPASS, FILE_NOT_FOUND, EVALUATE_SKIPPED, FILE_CORRUPTED) - NOT_IMPLEMENTED_ERROR being continued - unhandled errors being propagated - multiple virtual column exprs with mixed results
b5491f9 to
9eb4d3c
Compare
|
run buildall |
TPC-H: Total hot run time: 27692 ms |
TPC-DS: Total hot run time: 153213 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
…lumn for inverted index evaluation (apache#61092) Issue Number: close #xxx Problem Summary: In FULL OUTER JOIN queries, MATCH expressions in the SELECT list cannot be pushed down as filters (this would violate join semantics by incorrectly filtering rows). This means the inverted index cannot be used for MATCH evaluation, resulting in slow-path expression evaluation. This PR enables MATCH expressions used as **projections** to be pushed down as virtual columns on OlapScan, allowing the BE to evaluate them via inverted index using the existing `fast_execute()` caching mechanism. **Example:** ```sql -- Before: MATCH evaluated via slow path (no index) SELECT A.k1, A.content MATCH_ANY 'hello' as match_result FROM A FULL OUTER JOIN B ON A.k1 = B.k1; -- After: MATCH pushed as virtual column, evaluated via inverted index ``` **FE changes:** - `Match.java`: Add `PreferPushDownProject` interface so `PushDownProject` rule moves MATCH from join output into scan projections - `PushDownMatchProjectionAsVirtualColumn.java`: New rewrite rule converting MATCH projections to virtual columns on OlapScan - `RuleType.java` + `Rewriter.java`: Rule registration **BE changes (segment_iterator.cpp):** - `_construct_compound_expr_context()`: Set shared `IndexExecContext` on virtual column exprs - `_apply_index_expr()`: Evaluate inverted index for virtual column MATCH (bitmap only, no row filtering) - `_output_index_result_column_for_expr()`: Convert bitmap to UInt8 column for all index contexts (common exprs + virtual column exprs) The bitmap result is cached in `IndexExecContext`, and when `_materialization_of_virtual_column()` calls `VirtualSlotRef::execute_column()` → MATCH's `fast_execute()`, it returns the pre-computed column directly.
…umn for inverted index evaluation (#61244) ### What problem does this PR solve? Related PR: #61092 Problem Summary: Cherry-pick #61092 to branch-4.0. In FULL OUTER JOIN queries, MATCH expressions in the SELECT list cannot be pushed down as filters (this would violate join semantics by incorrectly filtering rows). This PR enables MATCH expressions used as **projections** to be pushed down as virtual columns on OlapScan, allowing the BE to evaluate them via inverted index using the existing `fast_execute()` caching mechanism. ### Release note Support MATCH expressions as projections pushed down to OlapScan as virtual columns, enabling inverted index evaluation for MATCH in contexts where it cannot be pushed as a filter (e.g., FULL OUTER JOIN). ### Check List (For Author) - Test - [x] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - Behavior changed: - [x] No. - Does this need documentation? - [x] No. ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
In FULL OUTER JOIN queries, MATCH expressions in the SELECT list cannot be pushed down as filters (this would violate join semantics by incorrectly filtering rows). This means the inverted index cannot be used for MATCH evaluation, resulting in slow-path expression evaluation.
This PR enables MATCH expressions used as projections to be pushed down as virtual columns on OlapScan, allowing the BE to evaluate them via inverted index using the existing
fast_execute()caching mechanism.Example:
FE changes:
Match.java: AddPreferPushDownProjectinterface soPushDownProjectrule moves MATCH from join output into scan projectionsPushDownMatchProjectionAsVirtualColumn.java: New rewrite rule converting MATCH projections to virtual columns on OlapScanRuleType.java+Rewriter.java: Rule registrationBE changes (segment_iterator.cpp):
_construct_compound_expr_context(): Set sharedIndexExecContexton virtual column exprs_apply_index_expr(): Evaluate inverted index for virtual column MATCH (bitmap only, no row filtering)_output_index_result_column_for_expr(): Convert bitmap to UInt8 column for all index contexts (common exprs + virtual column exprs)The bitmap result is cached in
IndexExecContext, and when_materialization_of_virtual_column()callsVirtualSlotRef::execute_column()→ MATCH'sfast_execute(), it returns the pre-computed column directly.Release note
Support MATCH expressions as projections pushed down to OlapScan as virtual columns, enabling inverted index evaluation for MATCH in contexts where it cannot be pushed as a filter (e.g., FULL OUTER JOIN).
Check List (For Author)
Test
virtualColumn=id MATCH_ANY 'hello'Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)