Skip to content

[Tech Request]: normalize per-column IN/NOT IN/=/<> domains in scan filters #24227

@ck89119

Description

@ck89119

Is there an existing issue for the same tech request?

  • I have checked the existing issues.

Does this tech request not affect user experience?

  • This tech request doesn't affect user experience.

What would you like to be added ?

Generalize the single-table scan filter rewrite so that per-column constant-domain
predicates (IN, NOT IN, =, <>/!=, IS NULL) are merged into one canonical
IN (or = / FALSE) per column.

Rules

  • col IN S1 AND col IN S2col IN (S1 ∩ S2); empty intersection → FALSE.
  • col = v AND col IN Scol = v when v ∈ S; else FALSE.
  • col IN S1 AND col NOT IN S2col IN (S1 \ S2); empty diff → FALSE.
  • col IN S1 AND col <> v1 AND col <> v2 ...col IN (S1 \ {v1, v2, ...}).
  • col IN S AND col IS NULLFALSE.
  • The outer domain is also propagated into OR branches / nested NOT(IN) so that
    e.g. col IN S1 AND (col = 'x' OR col NOT IN S2) rewrites the inner
    NOT IN to IN (S1 \ S2). This is the main bigsql scenario.
  • BETWEEN is left untouched in this pass (range algebra is deferred).
  • List-with-NULL and non-constant lists are skipped (NULL 3-valued logic is preserved).
  • Cross-column predicates are not combined; only same RelPos+ColPos on one TABLE_SCAN.

Why is this needed ?

Real BI-generated SQL often combines a large outer col IN (...) with a nested
col NOT IN (subset) inside OR. Without domain-based rewrite, the runtime scans
each row against the full negative list. By projecting the negative set onto the
outer domain we shrink NOT IN S2 into a much smaller IN (S1 \ S2), which then
benefits from normal IN runtime filters, zone map pruning, and shorter list
evaluation. For the bigsql case, 390-item NOT IN collapses to a 74-item IN.

The same framework also catches col IN S1 AND col IN S2, col = v AND col IN S,
multiple <> conjunctions, and null-contradictions — all surprisingly common in
generated SQL.

Additional information

Scope of this issue: plan-time rewrite only. Execution-side changes (bloom for
very large lists, anti-semi for NOT IN (subquery), OR factoring, low-cardinality
complement) are explicitly out of scope.

Implementation lives in pkg/sql/plan/expr_opt.go as normalizeColumnDomain,
invoked from opt_misc.go next to mergeFiltersOnCompositeKey.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions