Is there an existing issue for the same tech request?
Does this tech request not affect user experience?
What would you like to be added ?
Generalize the single-table scan filter rewrite so that per-column constant-domain
predicates (IN, NOT IN, =, <>/!=, IS NULL) are merged into one canonical
IN (or = / FALSE) per column.
Rules
col IN S1 AND col IN S2 → col IN (S1 ∩ S2); empty intersection → FALSE.
col = v AND col IN S → col = v when v ∈ S; else FALSE.
col IN S1 AND col NOT IN S2 → col IN (S1 \ S2); empty diff → FALSE.
col IN S1 AND col <> v1 AND col <> v2 ... → col IN (S1 \ {v1, v2, ...}).
col IN S AND col IS NULL → FALSE.
- The outer domain is also propagated into OR branches / nested
NOT(IN) so that
e.g. col IN S1 AND (col = 'x' OR col NOT IN S2) rewrites the inner
NOT IN to IN (S1 \ S2). This is the main bigsql scenario.
BETWEEN is left untouched in this pass (range algebra is deferred).
- List-with-NULL and non-constant lists are skipped (NULL 3-valued logic is preserved).
- Cross-column predicates are not combined; only same
RelPos+ColPos on one TABLE_SCAN.
Why is this needed ?
Real BI-generated SQL often combines a large outer col IN (...) with a nested
col NOT IN (subset) inside OR. Without domain-based rewrite, the runtime scans
each row against the full negative list. By projecting the negative set onto the
outer domain we shrink NOT IN S2 into a much smaller IN (S1 \ S2), which then
benefits from normal IN runtime filters, zone map pruning, and shorter list
evaluation. For the bigsql case, 390-item NOT IN collapses to a 74-item IN.
The same framework also catches col IN S1 AND col IN S2, col = v AND col IN S,
multiple <> conjunctions, and null-contradictions — all surprisingly common in
generated SQL.
Additional information
Scope of this issue: plan-time rewrite only. Execution-side changes (bloom for
very large lists, anti-semi for NOT IN (subquery), OR factoring, low-cardinality
complement) are explicitly out of scope.
Implementation lives in pkg/sql/plan/expr_opt.go as normalizeColumnDomain,
invoked from opt_misc.go next to mergeFiltersOnCompositeKey.
Is there an existing issue for the same tech request?
Does this tech request not affect user experience?
What would you like to be added ?
Generalize the single-table scan filter rewrite so that per-column constant-domain
predicates (
IN,NOT IN,=,<>/!=,IS NULL) are merged into one canonicalIN(or=/FALSE) per column.Rules
col IN S1 AND col IN S2→col IN (S1 ∩ S2); empty intersection →FALSE.col = v AND col IN S→col = vwhenv ∈ S; elseFALSE.col IN S1 AND col NOT IN S2→col IN (S1 \ S2); empty diff →FALSE.col IN S1 AND col <> v1 AND col <> v2 ...→col IN (S1 \ {v1, v2, ...}).col IN S AND col IS NULL→FALSE.NOT(IN)so thate.g.
col IN S1 AND (col = 'x' OR col NOT IN S2)rewrites the innerNOT INtoIN (S1 \ S2). This is the main bigsql scenario.BETWEENis left untouched in this pass (range algebra is deferred).RelPos+ColPoson oneTABLE_SCAN.Why is this needed ?
Real BI-generated SQL often combines a large outer
col IN (...)with a nestedcol NOT IN (subset)inside OR. Without domain-based rewrite, the runtime scanseach row against the full negative list. By projecting the negative set onto the
outer domain we shrink
NOT IN S2into a much smallerIN (S1 \ S2), which thenbenefits from normal
INruntime filters, zone map pruning, and shorter listevaluation. For the bigsql case, 390-item
NOT INcollapses to a 74-itemIN.The same framework also catches
col IN S1 AND col IN S2,col = v AND col IN S,multiple
<>conjunctions, and null-contradictions — all surprisingly common ingenerated SQL.
Additional information
Scope of this issue: plan-time rewrite only. Execution-side changes (bloom for
very large lists, anti-semi for
NOT IN (subquery), OR factoring, low-cardinalitycomplement) are explicitly out of scope.
Implementation lives in
pkg/sql/plan/expr_opt.goasnormalizeColumnDomain,invoked from
opt_misc.gonext tomergeFiltersOnCompositeKey.