[Tech Request]: normalize per-column IN/NOT IN/=/<> domains in scan filters

### Is there an existing issue for the same tech request?

- [x] I have checked the existing issues.

### Does this tech request not affect user experience?

- [x] This tech request doesn't affect user experience.

### What would you like to be added ?

Generalize the single-table scan filter rewrite so that per-column constant-domain
predicates (`IN`, `NOT IN`, `=`, `<>`/`!=`, `IS NULL`) are merged into one canonical
`IN` (or `=` / `FALSE`) per column.

**Rules**

- `col IN S1 AND col IN S2` → `col IN (S1 ∩ S2)`; empty intersection → `FALSE`.
- `col = v AND col IN S` → `col = v` when `v ∈ S`; else `FALSE`.
- `col IN S1 AND col NOT IN S2` → `col IN (S1 \ S2)`; empty diff → `FALSE`.
- `col IN S1 AND col <> v1 AND col <> v2 ...` → `col IN (S1 \ {v1, v2, ...})`.
- `col IN S AND col IS NULL` → `FALSE`.
- The outer domain is also propagated into OR branches / nested `NOT(IN)` so that
  e.g. `col IN S1 AND (col = 'x' OR col NOT IN S2)` rewrites the inner
  `NOT IN` to `IN (S1 \ S2)`. This is the main bigsql scenario.
- `BETWEEN` is left untouched in this pass (range algebra is deferred).
- List-with-NULL and non-constant lists are skipped (NULL 3-valued logic is preserved).
- Cross-column predicates are not combined; only same `RelPos+ColPos` on one `TABLE_SCAN`.

### Why is this needed ?

Real BI-generated SQL often combines a large outer `col IN (...)` with a nested
`col NOT IN (subset)` inside OR. Without domain-based rewrite, the runtime scans
each row against the full negative list. By projecting the negative set onto the
outer domain we shrink `NOT IN S2` into a much smaller `IN (S1 \ S2)`, which then
benefits from normal `IN` runtime filters, zone map pruning, and shorter list
evaluation. For the bigsql case, 390-item `NOT IN` collapses to a 74-item `IN`.

The same framework also catches `col IN S1 AND col IN S2`, `col = v AND col IN S`,
multiple `<>` conjunctions, and null-contradictions — all surprisingly common in
generated SQL.

### Additional information

Scope of this issue: plan-time rewrite only. Execution-side changes (bloom for
very large lists, anti-semi for `NOT IN (subquery)`, OR factoring, low-cardinality
complement) are explicitly out of scope.

Implementation lives in `pkg/sql/plan/expr_opt.go` as `normalizeColumnDomain`,
invoked from `opt_misc.go` next to `mergeFiltersOnCompositeKey`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tech Request]: normalize per-column IN/NOT IN/=/<> domains in scan filters #24227

Is there an existing issue for the same tech request?

Does this tech request not affect user experience?

What would you like to be added ?

Why is this needed ?

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tech Request]: normalize per-column IN/NOT IN/=/<> domains in scan filters #24227

Description

Is there an existing issue for the same tech request?

Does this tech request not affect user experience?

What would you like to be added ?

Why is this needed ?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions