[CELEBORN-2307] Support accurate disk usage accounting to HARD_SPLIT accurately.#3644
Open
saurabhd336 wants to merge 15 commits intoapache:mainfrom
Open
[CELEBORN-2307] Support accurate disk usage accounting to HARD_SPLIT accurately.#3644saurabhd336 wants to merge 15 commits intoapache:mainfrom
saurabhd336 wants to merge 15 commits intoapache:mainfrom
Conversation
Contributor
|
Please create a jira to tag this pr @saurabhd336 |
Contributor
Author
|
@zaynt4606 I've created and attached the JIRA. Could you please help review / assign reviewers. Just for context, we have seen issues where the async nature of the disk usage update can delay HARD_SPLITs to the point where were completely run out of disk space. For our setup, this can lead to rather serious degradation. This config based feature will help us be more accurate with our disk usage accounting. cc: @s0nskar |
…ker heartbeat cycle for usable space updation
f957fdc to
2f0bc53
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Often times, celeborn is too late in detecting diskfull issues simply because the DiskInfo's
usableSpaceis updated asynchronously in the worker heartbeat flow.In such cases, if heartbeats are missed and / or multiple highly large writers end up pushing too much data to memory buffers (bypassing the diskfull based HARD_SPLIT checks), it can cause severe degradation.
In some cases we've noticed that we easily breach the configured disk usage limit, causing job degradations, cleanup failures (due to rocksdb sharing the disk with shuffle data) which makes the situation even worse.
This change proposes a more realtime, coordinated acquisition during flush, making the disk full detection full proof preventing any spillage beyond the configured limits.
Additionally, when sorting partition files, currently the extra disk space used isn't accounted for at all. This PR also changes the logic to account for disk space used / reclaimed during the file sorting process.
Everything behind a new config
celeborn.worker.disk.storage.strictReserve.enabled, currently default set tofalse.Why are the changes needed?
Disk full detection is not full proof
Does this PR resolve a correctness bug?
No
Does this PR introduce any user-facing change?
No
How was this patch tested?
UTs added