Skip to content

skip trying to generate min max if buffer is EmptyIndexBuffer#18031

Draft
rhodo wants to merge 1 commit intoapache:masterfrom
rhodo:skip_gen_metadata_if_column_is_empty
Draft

skip trying to generate min max if buffer is EmptyIndexBuffer#18031
rhodo wants to merge 1 commit intoapache:masterfrom
rhodo:skip_gen_metadata_if_column_is_empty

Conversation

@rhodo
Copy link
Copy Markdown
Collaborator

@rhodo rhodo commented Mar 30, 2026

This pull request introduces a safeguard in the ColumnMinMaxValueGenerator to ensure that the min/max value calculation is skipped for columns whose forward index is an EmptyIndexBuffer. This helps prevent unnecessary processing and potential errors for columns that do not have a forward index.

Robustness improvement:

  • Added a check in addColumnMinMaxValueWithoutDictionary to return early if the forward index buffer is an instance of EmptyIndexBuffer, preventing further processing on empty indexes.
  • Imported EmptyIndexBuffer in ColumnMinMaxValueGenerator.java to support the new check.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.39%. Comparing base (ecaf0f8) to head (67bb503).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
.../columnminmaxvalue/ColumnMinMaxValueGenerator.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18031      +/-   ##
============================================
+ Coverage     63.30%   63.39%   +0.09%     
  Complexity     1543     1543              
============================================
  Files          3200     3200              
  Lines        194169   194171       +2     
  Branches      29915    29916       +1     
============================================
+ Hits         122914   123099     +185     
+ Misses        61610    61395     -215     
- Partials       9645     9677      +32     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.32% <0.00%> (+0.05%) ⬆️
java-21 63.33% <0.00%> (+0.06%) ⬆️
temurin 63.39% <0.00%> (+0.09%) ⬆️
unittests 63.39% <0.00%> (+0.09%) ⬆️
unittests1 55.55% <0.00%> (+<0.01%) ⬆️
unittests2 34.29% <0.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates segment min/max generation to skip attempting min/max computation for no-dictionary columns when the forward index buffer is an EmptyIndexBuffer (e.g., a zero-size/remote index entry), avoiding unsupported reads.

Changes:

  • Import EmptyIndexBuffer and add an early-return guard when the raw forward index buffer is empty.
  • Prevents ForwardIndexReader creation/usage on an EmptyIndexBuffer in the no-dictionary min/max generation path.

DataType storedType = dataType.getStoredType();
boolean isSingleValue = columnMetadata.isSingleValue();
PinotDataBuffer rawIndexBuffer = _segmentWriter.getIndexFor(columnName, StandardIndexes.forward());
if (rawIndexBuffer instanceof EmptyIndexBuffer) {
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early return for EmptyIndexBuffer means addColumnMinMaxValueForColumn() will still set _minMaxValueAdded = true even though no min/max was generated or persisted for this column. This can trigger an unnecessary metadata save and makes _minMaxValueAdded semantically incorrect. Consider returning a boolean from addColumnMinMaxValueWithoutDictionary() (or throwing a specific exception) so the caller only marks _minMaxValueAdded when min/max (or the MIN_MAX_VALUE_INVALID flag) is actually written to _segmentProperties.

Suggested change
if (rawIndexBuffer instanceof EmptyIndexBuffer) {
if (rawIndexBuffer instanceof EmptyIndexBuffer) {
// Persist an invalid min/max marker so callers can safely assume metadata was updated.
SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(_segmentProperties, columnName, null, null, storedType);

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +210
if (rawIndexBuffer instanceof EmptyIndexBuffer) {
return;
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces new behavior for segments where the forward index maps to an EmptyIndexBuffer (e.g., zero-size/remote entries). There are existing preprocessor tests around min/max generation, but none cover this new skip path; adding a test that exercises an EmptyIndexBuffer-backed forward index would help prevent regressions (e.g., ensuring we don’t persist min/max and/or set the invalid flag consistently).

Copilot generated this review using guidance from repository custom instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants