DISTINCT on indexed column scans full index table, prefix_in block filter ineffective

## Bug Description

When running `SELECT DISTINCT col FROM table WHERE col IN (...)` on a column that is the leading key of a composite secondary index, the query scans the **entire index table** instead of pruning via `prefix_in` block filter. The equivalent `SELECT COUNT(*) ... WHERE col IN (...)` correctly prunes and is ~90x faster.

## Environment

- MatrixOne version: latest main
- Table: ~45M rows, composite secondary index on `(b28_s_kgd353d, b28_s_kgdtvnx, b28_s_kgdp984, b28_s_kgd4kbn, b28_s_kgd4b76)`

## Steps to Reproduce

```sql
-- Table DDL (simplified)
CREATE TABLE bpc_consolidated_report (
  b28_s_kgd353d varchar(255) DEFAULT NULL,
  -- ... other columns ...
  KEY idx_bpc_consolidated_report_q1 (b28_s_kgd353d, b28_s_kgdtvnx, b28_s_kgdp984, b28_s_kgd4kbn, b28_s_kgd4b76)
);

-- Fast query (0.4s): block filter works, scans 886 blocks / 7.2M rows
SELECT COUNT(*) FROM bpc_consolidated_report
WHERE b28_s_kgd353d IN ('2022.12', '2023.12', '2024.12', '2025.12');

-- Slow query (38s): block filter ineffective, scans 5503 blocks / 45M rows
SELECT DISTINCT b28_s_kgd353d FROM bpc_consolidated_report
WHERE b28_s_kgd353d IN ('2022.12', '2023.12', '2024.12', '2025.12');
```

## Explain Analyze Output

### COUNT(*) — 0.433 sec ✅

```
Aggregate
  inputRows=7200773 outputRows=1
  ->  Index Table Scan on idx_bpc_consolidated_report_q1
        inputBlocks=886 inputRows=7200773 ReadSize=156mb
        Filter Cond: prefix_in(__mo_index_idx_col)
        Block Filter Cond: prefix_in(__mo_index_idx_col)
```

### DISTINCT — 38.008 sec ❌

```
Aggregate
  inputRows=45006351 outputRows=4 timeConsumed=35708ms
  Group Key: serial_extract(__mo_index_idx_col, 0, VARCHAR))
  ->  Index Table Scan on idx_bpc_consolidated_report_q1
        inputBlocks=5503 inputRows=45006351 ReadSize=932mb
        Filter Cond: prefix_in(__mo_index_idx_col)
        Block Filter Cond: prefix_in(__mo_index_idx_col)
```

## Analysis

Both queries use `prefix_in` as the Block Filter Cond, but:

| Metric | COUNT(\*) | DISTINCT |
|---|---|---|
| inputBlocks | 886 | 5,503 |
| inputRows | 7,200,773 | 45,006,351 |
| ReadSize | 156 MB | 932 MB |
| Total time | 0.4s | 38s |

The matching rows are ~7.2M out of ~45M total (~16%). The COUNT query correctly prunes to only those rows, but the DISTINCT query reads the entire index table (45M rows = all rows in the index). The `prefix_in` block filter appears in the plan but has no actual pruning effect for the DISTINCT path.

Additionally, the DISTINCT aggregation uses `serial_extract(__mo_index_idx_col, 0, VARCHAR)` to extract the leading column from the serialized composite index key, which adds ~35.7s of CPU overhead on the unnecessarily large 45M row input.

## Expected Behavior

The DISTINCT query should achieve similar block filter pruning as the COUNT query, scanning ~886 blocks / 7.2M rows instead of the full 5,503 blocks / 45M rows.

## Workaround

Rewrite as UNION ALL of per-value queries to force the efficient prefix_in path:

```sql
SELECT '2022.12' AS b28_s_kgd353d FROM bpc_consolidated_report WHERE b28_s_kgd353d = '2022.12' LIMIT 1
UNION ALL
SELECT '2023.12' FROM bpc_consolidated_report WHERE b28_s_kgd353d = '2023.12' LIMIT 1
UNION ALL
SELECT '2024.12' FROM bpc_consolidated_report WHERE b28_s_kgd353d = '2024.12' LIMIT 1
UNION ALL
SELECT '2025.12' FROM bpc_consolidated_report WHERE b28_s_kgd353d = '2025.12' LIMIT 1;
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISTINCT on indexed column scans full index table, prefix_in block filter ineffective #24241

Bug Description

Environment

Steps to Reproduce

Explain Analyze Output

COUNT(*) — 0.433 sec ✅

DISTINCT — 38.008 sec ❌

Analysis

Expected Behavior

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	COUNT(*)	DISTINCT
inputBlocks	886	5,503
inputRows	7,200,773	45,006,351
ReadSize	156 MB	932 MB
Total time	0.4s	38s

DISTINCT on indexed column scans full index table, prefix_in block filter ineffective #24241

Description

Bug Description

Environment

Steps to Reproduce

Explain Analyze Output

COUNT(*) — 0.433 sec ✅

DISTINCT — 38.008 sec ❌

Analysis

Expected Behavior

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions