Skip to content

[Bug]: Top operator OOM on large LIMIT with ORDER BY due to holding all rows in heap memory #24243

@jiangxinmeng1

Description

@jiangxinmeng1

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

7ca9d6c

Other Environment Information

- Hardware parameters: CI environment, 16 core
- OS type: Linux
- Others: nightly regression test with 100M row dataset

Actual Behavior

INSERT INTO ... SELECT ... ORDER BY col4 LIMIT 5000000 causes OOM during nightly regression test on 100M row table.

Root cause analysis:

  1. Top operator heap memory: When LIMIT is large (e.g. 5M rows), the Top operator keeps ALL limit rows with ALL columns in memory heap. For wide rows this causes O(limit × row_width) memory usage, which can reach tens of GiBs.
  2. MergeTop memory leak: defer bat.Clean(proc.Mp()) is placed inside a for loop in mergetop/top.go:build(). Since defer only runs when the function returns, all duplicated batches from every loop iteration accumulate in memory until the entire build phase completes. For a 100M row input split into thousands of batches, this causes massive memory waste.
  3. Eval phase peak memory: The eval function materializes all limit rows at once before sending, creating another O(limit × row_width) memory spike.

CI link: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/25022702692/job/73291550102

Failed SQL:

insert into big_data_test.insert_into_table_limit 
select * from big_data_test.table_basic_for_load_100M 
order by col4 limit 5000000

Expected Behavior

The query should complete without OOM by limiting peak memory usage to O(limit × key_width) during build phase and streaming output in chunks during eval phase.

Steps to Reproduce

1. Create a table with 100M rows and multiple columns (int, varchar, double, etc.)
2. Run: INSERT INTO target SELECT * FROM source ORDER BY col LIMIT 5000000
3. Observe OOM kill

Additional information

The fix involves three changes:

  1. Spill to disk: For LIMIT > 16384, the Top operator keeps only sort-key columns in the heap and spills full rows to a temp file. Heap memory drops from O(limit × row_width) to O(limit × key_width).
  2. Fix MergeTop memory leak: Replace defer bat.Clean() inside loop with explicit bat.Clean() after each processBatch call.
  3. Streaming eval: In spill mode, eval outputs rows in 8192-row chunks instead of materializing all limit rows at once, keeping eval peak memory at O(chunk_size × row_width).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions