fix(top): spill to disk and streaming eval to prevent OOM on large LIMIT (Fixes #24243) by jiangxinmeng1 · Pull Request #24244 · matrixorigin/matrixone

jiangxinmeng1 · 2026-04-29T02:16:40Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

INSERT INTO ... SELECT ... ORDER BY col LIMIT 5000000 on a 100M row table causes OOM in CI nightly regression. The Top operator holds all LIMIT rows with ALL columns in the heap, consuming O(limit × row_width) memory — for 5M rows of wide data this reaches tens of GiB.

This PR makes three targeted changes:

1. Top operator: spill to disk for large LIMIT (`top/top.go`, `top/types.go`)

When LIMIT > 16384, the Top operator now:

Keeps only sort-key columns in the in-memory heap (not all columns)
Spills full rows to a temporary file via batch.MarshalBinary
Tracks rowRef{batchIdx, rowIdx} per heap entry to locate spilled rows during eval

Heap memory drops from O(limit × row_width) to O(limit × key_width).

2. Top operator: streaming eval in spill mode (`top/top.go`)

Instead of materializing all LIMIT rows into one giant batch during eval, spill mode now:

Pops heap into sorted orderedRefs
Frees the heap batch immediately
Reads spilled batches and outputs 8192-row chunks per Call() invocation

Eval peak memory drops from O(limit × row_width) to O(chunk_size × row_width) (~10 MiB per chunk).

3. MergeTop: fix memory leak from `defer` in loop (`mergetop/top.go`)

defer bat.Clean(proc.Mp()) was placed inside a for loop in build(). Since defer only fires on function return, every duplicated batch from each iteration accumulated in memory. Replaced with explicit bat.Clean() after each processBatch call and on error paths.

How this PR impacts memory (conceptual):

Scenario	Before	After
Top heap (5M rows, 100 bytes/row)	~500 MiB (all cols)	~40 MiB (key cols only)
Top eval peak	~500 MiB (materialize all)	~10 MiB (8192-row chunks)
MergeTop intermediate batches	All held until return	Freed per iteration

When LIMIT exceeds 16384 rows, the Top operator now keeps only sort-key columns in the heap and spills full rows to a temp file. During eval, needed rows are read back from disk and assembled into the output batch. This reduces heap memory from O(limit * row_width) to O(limit * key_width). Also fixes a memory leak in mergetop where defer bat.Clean inside a loop kept all intermediate batches alive until function return. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous spill fix only moved memory from build to eval phase. Now evalSpill streams output in 8192-row chunks instead of materializing all limit rows at once. Peak memory during eval drops from O(limit * row_width) to O(chunk_size * row_width), e.g. ~10 MB per chunk instead of ~7 GiB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mergify · 2026-04-29T10:37:47Z

Merge Queue Status

✅ Entered queue — 2026-04-29 10:37 UTC · Rule: main
🚫 Left the queue — 2026-04-29 10:37 UTC · at 9a18cc76e25f162a0c70bcaff2ed80149012da1d

This pull request spent 8 seconds in the queue, with no time running CI.

Reason

The pull request can't be updated

For security reasons, Mergify can't update this pull request. Try updating locally.
GitHub response: refusing to allow a GitHub App to create or update workflow .github/workflows/test-coverage-bot-bridge.yml without workflows permission

Hint

You should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

mergify · 2026-04-30T03:03:38Z

Merge Queue Status

✅ Entered queue — 2026-04-30 03:03 UTC · Rule: main
🚫 Left the queue — 2026-04-30 03:03 UTC · at 9a18cc76e25f162a0c70bcaff2ed80149012da1d

This pull request spent 6 seconds in the queue, with no time running CI.

Reason

The pull request can't be updated

For security reasons, Mergify can't update this pull request. Try updating locally.
GitHub response: refusing to allow a GitHub App to create or update workflow .github/workflows/test-coverage-bot-bridge.yml without workflows permission

Hint

You should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

jiangxinmeng1 and others added 2 commits April 28, 2026 16:57

jiangxinmeng1 requested review from aunjgr and ouyuanning as code owners April 29, 2026 02:16

jiangxinmeng1 temporarily deployed to ci April 29, 2026 02:16 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci April 29, 2026 02:16 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci April 29, 2026 02:16 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci April 29, 2026 02:16 — with GitHub Actions Failure

jiangxinmeng1 temporarily deployed to ci April 29, 2026 02:16 — with GitHub Actions Inactive

matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Apr 29, 2026

mergify Bot added the kind/bug Something isn't working label Apr 29, 2026

style(top): fix gofmt formatting in types.go

23c6673

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jiangxinmeng1 temporarily deployed to ci April 29, 2026 03:05 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci April 29, 2026 03:05 — with GitHub Actions Error

jiangxinmeng1 had a problem deploying to ci April 29, 2026 03:06 — with GitHub Actions Failure

jiangxinmeng1 temporarily deployed to ci April 29, 2026 03:06 — with GitHub Actions Inactive

ouyuanning approved these changes Apr 29, 2026

View reviewed changes

jiangxinmeng1 temporarily deployed to ci April 29, 2026 09:40 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci April 29, 2026 09:40 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci April 29, 2026 09:40 — with GitHub Actions Inactive

matrix-meow added size/L Denotes a PR that changes [500,999] lines and removed size/M Denotes a PR that changes [100,499] lines labels Apr 29, 2026

mergify Bot added the queued label Apr 29, 2026

mergify Bot added dequeued and removed queued labels Apr 29, 2026

jiangxinmeng1 temporarily deployed to ci April 30, 2026 01:39 — with GitHub Actions Inactive

mergify Bot added the queued label Apr 30, 2026

mergify Bot removed the queued label Apr 30, 2026

Merge branch 'main' into test-main-order-limit

87cb4f5

jiangxinmeng1 temporarily deployed to ci April 30, 2026 03:21 — with GitHub Actions Inactive

jiangxinmeng1 requested a deployment to ci April 30, 2026 03:21 — with GitHub Actions In progress

jiangxinmeng1 temporarily deployed to ci April 30, 2026 03:21 — with GitHub Actions Inactive

jiangxinmeng1 deployed to ci April 30, 2026 03:21 — with GitHub Actions Active

jiangxinmeng1 temporarily deployed to ci April 30, 2026 03:21 — with GitHub Actions Inactive

jiangxinmeng1 requested a deployment to ci April 30, 2026 03:21 — with GitHub Actions In progress

jiangxinmeng1 temporarily deployed to ci April 30, 2026 03:21 — with GitHub Actions Inactive

mergify Bot removed the dequeued label Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(top): spill to disk and streaming eval to prevent OOM on large LIMIT (Fixes #24243)#24244

fix(top): spill to disk and streaming eval to prevent OOM on large LIMIT (Fixes #24243)#24244
jiangxinmeng1 wants to merge 5 commits intomatrixorigin:mainfrom
jiangxinmeng1:test-main-order-limit

jiangxinmeng1 commented Apr 29, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jiangxinmeng1 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

1. Top operator: spill to disk for large LIMIT (top/top.go, top/types.go)

2. Top operator: streaming eval in spill mode (top/top.go)

3. MergeTop: fix memory leak from defer in loop (mergetop/top.go)

How this PR impacts memory (conceptual):

Uh oh!

mergify Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Reason

Hint

Uh oh!

mergify Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Reason

Hint

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiangxinmeng1 commented Apr 29, 2026 •

edited

Loading

1. Top operator: spill to disk for large LIMIT (`top/top.go`, `top/types.go`)

2. Top operator: streaming eval in spill mode (`top/top.go`)

3. MergeTop: fix memory leak from `defer` in loop (`mergetop/top.go`)

mergify Bot commented Apr 29, 2026 •

edited

Loading

mergify Bot commented Apr 30, 2026 •

edited

Loading