Is there an existing issue for the same bug?
Branch Name
main
Commit ID
7ca9d6c
Other Environment Information
- Hardware parameters: CI environment, 16 core
- OS type: Linux
- Others: nightly regression test with 100M row dataset
Actual Behavior
INSERT INTO ... SELECT ... ORDER BY col4 LIMIT 5000000 causes OOM during nightly regression test on 100M row table.
Root cause analysis:
- Top operator heap memory: When LIMIT is large (e.g. 5M rows), the Top operator keeps ALL limit rows with ALL columns in memory heap. For wide rows this causes O(limit × row_width) memory usage, which can reach tens of GiBs.
- MergeTop memory leak:
defer bat.Clean(proc.Mp()) is placed inside a for loop in mergetop/top.go:build(). Since defer only runs when the function returns, all duplicated batches from every loop iteration accumulate in memory until the entire build phase completes. For a 100M row input split into thousands of batches, this causes massive memory waste.
- Eval phase peak memory: The
eval function materializes all limit rows at once before sending, creating another O(limit × row_width) memory spike.
CI link: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/25022702692/job/73291550102
Failed SQL:
insert into big_data_test.insert_into_table_limit
select * from big_data_test.table_basic_for_load_100M
order by col4 limit 5000000
Expected Behavior
The query should complete without OOM by limiting peak memory usage to O(limit × key_width) during build phase and streaming output in chunks during eval phase.
Steps to Reproduce
1. Create a table with 100M rows and multiple columns (int, varchar, double, etc.)
2. Run: INSERT INTO target SELECT * FROM source ORDER BY col LIMIT 5000000
3. Observe OOM kill
Additional information
The fix involves three changes:
- Spill to disk: For LIMIT > 16384, the Top operator keeps only sort-key columns in the heap and spills full rows to a temp file. Heap memory drops from O(limit × row_width) to O(limit × key_width).
- Fix MergeTop memory leak: Replace
defer bat.Clean() inside loop with explicit bat.Clean() after each processBatch call.
- Streaming eval: In spill mode, eval outputs rows in 8192-row chunks instead of materializing all limit rows at once, keeping eval peak memory at O(chunk_size × row_width).
Is there an existing issue for the same bug?
Branch Name
main
Commit ID
7ca9d6c
Other Environment Information
Actual Behavior
INSERT INTO ... SELECT ... ORDER BY col4 LIMIT 5000000causes OOM during nightly regression test on 100M row table.Root cause analysis:
defer bat.Clean(proc.Mp())is placed inside aforloop inmergetop/top.go:build(). Sincedeferonly runs when the function returns, all duplicated batches from every loop iteration accumulate in memory until the entire build phase completes. For a 100M row input split into thousands of batches, this causes massive memory waste.evalfunction materializes all limit rows at once before sending, creating another O(limit × row_width) memory spike.CI link: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/25022702692/job/73291550102
Failed SQL:
Expected Behavior
The query should complete without OOM by limiting peak memory usage to O(limit × key_width) during build phase and streaming output in chunks during eval phase.
Steps to Reproduce
Additional information
The fix involves three changes:
defer bat.Clean()inside loop with explicitbat.Clean()after eachprocessBatchcall.