Skip to content

fix(top): spill to disk and streaming eval to prevent OOM on large LIMIT (Fixes #24243)#24244

Open
jiangxinmeng1 wants to merge 5 commits intomatrixorigin:mainfrom
jiangxinmeng1:test-main-order-limit
Open

fix(top): spill to disk and streaming eval to prevent OOM on large LIMIT (Fixes #24243)#24244
jiangxinmeng1 wants to merge 5 commits intomatrixorigin:mainfrom
jiangxinmeng1:test-main-order-limit

Conversation

@jiangxinmeng1
Copy link
Copy Markdown
Contributor

@jiangxinmeng1 jiangxinmeng1 commented Apr 29, 2026

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

fixes #24243

What this PR does / why we need it:

INSERT INTO ... SELECT ... ORDER BY col LIMIT 5000000 on a 100M row table causes OOM in CI nightly regression. The Top operator holds all LIMIT rows with ALL columns in the heap, consuming O(limit × row_width) memory — for 5M rows of wide data this reaches tens of GiB.

This PR makes three targeted changes:

1. Top operator: spill to disk for large LIMIT (top/top.go, top/types.go)

When LIMIT > 16384, the Top operator now:

  • Keeps only sort-key columns in the in-memory heap (not all columns)
  • Spills full rows to a temporary file via batch.MarshalBinary
  • Tracks rowRef{batchIdx, rowIdx} per heap entry to locate spilled rows during eval

Heap memory drops from O(limit × row_width) to O(limit × key_width).

2. Top operator: streaming eval in spill mode (top/top.go)

Instead of materializing all LIMIT rows into one giant batch during eval, spill mode now:

  • Pops heap into sorted orderedRefs
  • Frees the heap batch immediately
  • Reads spilled batches and outputs 8192-row chunks per Call() invocation

Eval peak memory drops from O(limit × row_width) to O(chunk_size × row_width) (~10 MiB per chunk).

3. MergeTop: fix memory leak from defer in loop (mergetop/top.go)

defer bat.Clean(proc.Mp()) was placed inside a for loop in build(). Since defer only fires on function return, every duplicated batch from each iteration accumulated in memory. Replaced with explicit bat.Clean() after each processBatch call and on error paths.

How this PR impacts memory (conceptual):

Scenario Before After
Top heap (5M rows, 100 bytes/row) ~500 MiB (all cols) ~40 MiB (key cols only)
Top eval peak ~500 MiB (materialize all) ~10 MiB (8192-row chunks)
MergeTop intermediate batches All held until return Freed per iteration

jiangxinmeng1 and others added 2 commits April 28, 2026 16:57
When LIMIT exceeds 16384 rows, the Top operator now keeps only sort-key
columns in the heap and spills full rows to a temp file. During eval,
needed rows are read back from disk and assembled into the output batch.
This reduces heap memory from O(limit * row_width) to O(limit * key_width).

Also fixes a memory leak in mergetop where defer bat.Clean inside a loop
kept all intermediate batches alive until function return.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous spill fix only moved memory from build to eval phase.
Now evalSpill streams output in 8192-row chunks instead of materializing
all limit rows at once. Peak memory during eval drops from O(limit * row_width)
to O(chunk_size * row_width), e.g. ~10 MB per chunk instead of ~7 GiB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 29, 2026

Merge Queue Status

  • Entered queue2026-04-29 10:37 UTC · Rule: main
  • 🚫 Left the queue2026-04-29 10:37 UTC · at 9a18cc76e25f162a0c70bcaff2ed80149012da1d

This pull request spent 8 seconds in the queue, with no time running CI.

Reason

The pull request can't be updated

For security reasons, Mergify can't update this pull request. Try updating locally.
GitHub response: refusing to allow a GitHub App to create or update workflow .github/workflows/test-coverage-bot-bridge.yml without workflows permission

Hint

You should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Merge Queue Status

  • Entered queue2026-04-30 03:03 UTC · Rule: main
  • 🚫 Left the queue2026-04-30 03:03 UTC · at 9a18cc76e25f162a0c70bcaff2ed80149012da1d

This pull request spent 6 seconds in the queue, with no time running CI.

Reason

The pull request can't be updated

For security reasons, Mergify can't update this pull request. Try updating locally.
GitHub response: refusing to allow a GitHub App to create or update workflow .github/workflows/test-coverage-bot-bridge.yml without workflows permission

Hint

You should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

@mergify mergify Bot removed the queued label Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/L Denotes a PR that changes [500,999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Top operator OOM on large LIMIT with ORDER BY due to holding all rows in heap memory

4 participants