skip trying to generate min max if buffer is EmptyIndexBuffer#18031
skip trying to generate min max if buffer is EmptyIndexBuffer#18031rhodo wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18031 +/- ##
============================================
+ Coverage 63.30% 63.39% +0.09%
Complexity 1543 1543
============================================
Files 3200 3200
Lines 194169 194171 +2
Branches 29915 29916 +1
============================================
+ Hits 122914 123099 +185
+ Misses 61610 61395 -215
- Partials 9645 9677 +32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR updates segment min/max generation to skip attempting min/max computation for no-dictionary columns when the forward index buffer is an EmptyIndexBuffer (e.g., a zero-size/remote index entry), avoiding unsupported reads.
Changes:
- Import
EmptyIndexBufferand add an early-return guard when the raw forward index buffer is empty. - Prevents
ForwardIndexReadercreation/usage on anEmptyIndexBufferin the no-dictionary min/max generation path.
| DataType storedType = dataType.getStoredType(); | ||
| boolean isSingleValue = columnMetadata.isSingleValue(); | ||
| PinotDataBuffer rawIndexBuffer = _segmentWriter.getIndexFor(columnName, StandardIndexes.forward()); | ||
| if (rawIndexBuffer instanceof EmptyIndexBuffer) { |
There was a problem hiding this comment.
The early return for EmptyIndexBuffer means addColumnMinMaxValueForColumn() will still set _minMaxValueAdded = true even though no min/max was generated or persisted for this column. This can trigger an unnecessary metadata save and makes _minMaxValueAdded semantically incorrect. Consider returning a boolean from addColumnMinMaxValueWithoutDictionary() (or throwing a specific exception) so the caller only marks _minMaxValueAdded when min/max (or the MIN_MAX_VALUE_INVALID flag) is actually written to _segmentProperties.
| if (rawIndexBuffer instanceof EmptyIndexBuffer) { | |
| if (rawIndexBuffer instanceof EmptyIndexBuffer) { | |
| // Persist an invalid min/max marker so callers can safely assume metadata was updated. | |
| SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(_segmentProperties, columnName, null, null, storedType); |
| if (rawIndexBuffer instanceof EmptyIndexBuffer) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
This introduces new behavior for segments where the forward index maps to an EmptyIndexBuffer (e.g., zero-size/remote entries). There are existing preprocessor tests around min/max generation, but none cover this new skip path; adding a test that exercises an EmptyIndexBuffer-backed forward index would help prevent regressions (e.g., ensuring we don’t persist min/max and/or set the invalid flag consistently).
This pull request introduces a safeguard in the
ColumnMinMaxValueGeneratorto ensure that the min/max value calculation is skipped for columns whose forward index is anEmptyIndexBuffer. This helps prevent unnecessary processing and potential errors for columns that do not have a forward index.Robustness improvement:
addColumnMinMaxValueWithoutDictionaryto return early if the forward index buffer is an instance ofEmptyIndexBuffer, preventing further processing on empty indexes.EmptyIndexBufferinColumnMinMaxValueGenerator.javato support the new check.