Skip to content

Fix/tweak pinned memory accounting#13221

Merged
comfyanonymous merged 2 commits intoComfy-Org:masterfrom
rattus128:prs/pin-balances
Mar 29, 2026
Merged

Fix/tweak pinned memory accounting#13221
comfyanonymous merged 2 commits intoComfy-Org:masterfrom
rattus128:prs/pin-balances

Conversation

@rattus128
Copy link
Copy Markdown
Contributor

A bugfix and and tweak to pinned memory accounting from a shared debug session.

I never reproduced this personally.

The pinned memory accumulation was clearly under-reporting the shared-gpu-memory usage, possibly due to rounding fragmentation. Tweaking the pin ceiling saved the crash.

Example test conditions (from user report and live debug session):

Windows, RTX5080 laptop, 32GB RAM
LTX2.3 720Px12s (with latent upscaler)

Before:

...
  File "C:\Users\xxx\Documents\comfyui-portable\ComfyUI\comfy\lora.py", line 428, in calculate_weight
    output = v.calculate_weight(weight, key, strength, strength_model, offset, function, intermediate_dtype, original_weights)
  File "C:\Users\xxx\Documents\comfyui-portable\ComfyUI\comfy\weight_adapter\lora.py", line 236, in calculate_weight
    mat1 = comfy.model_management.cast_to_device(
        v[0], weight.device, intermediate_dtype
    )
  File "C:\Users\xxx\Documents\comfyui-portable\ComfyUI\comfy\model_management.py", line 1319, in cast_to_device
    return cast_to(tensor, dtype=dtype, device=device, non_blocking=non_blocking, copy=copy)
  File "C:\Users\xxx\Documents\comfyui-portable\ComfyUI\comfy\model_management.py", line 1314, in cast_to
    r.copy_(weight, non_blocking=non_blocking)
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: out of memory
Search for `cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
...

After:

Workflow runs ✅

Regression perf test:

Windows 5060, 64GB, LTX2 FP16.

Before (30.9GB shared mem usage):

Prompt executed in 230.68 seconds

After (27.2 GB shared mem usage):

Prompt executed in 231.95 seconds

Windows 5060, 64GB, wan 2.2 14Bx2 FP16:

Before (29.8GB shared mem usage):

Prompt executed in 77.16 seconds

After:

Prompt executed in 77.44 seconds

Some workflows have more extranous use of shared GPU memory than is
accounted for in the 5% pin headroom. Lower this for safety.
TOTAL_PINNED_MEMORY is shared between the legacy and aimdo pinning
systems, however this catch-all assumes only the legacy system exists.
Remove the catch-all as the PINNED_MEMORY buffer is coherent already.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 72f9ad9f-c14a-4f7d-9f12-d2e06fea5af4

📥 Commits

Reviewing files that changed from the base of the PR and between a500f1e and c4e6511.

📒 Files selected for processing (1)
  • comfy/model_management.py

📝 Walkthrough

Walkthrough

The change modifies pinned-memory budget allocation in the model management system. Specifically, it adjusts the maximum pinned-memory multipliers for NVIDIA/AMD GPUs on both platforms: the Windows multiplier decreased from 0.45 to 0.40, and the non-Windows multiplier decreased from 0.95 to 0.90. Additionally, the unpin_memory() function no longer explicitly sets TOTAL_PINNED_MEMORY to 0 after unpinning the final pinned tensor.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adjusting pinned memory budget multipliers and removing explicit zeroing logic in unpin_memory.
Description check ✅ Passed The description is directly related to the changeset, explaining the bugfix context, providing reproduction details, and including performance regression test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@comfyanonymous comfyanonymous merged commit 8d723d2 into Comfy-Org:master Mar 29, 2026
14 checks passed
comfyanonymous pushed a commit that referenced this pull request Apr 3, 2026
* mm: Lower windows pin threshold

Some workflows have more extranous use of shared GPU memory than is
accounted for in the 5% pin headroom. Lower this for safety.

* mm: Remove pin count clearing threshold.

TOTAL_PINNED_MEMORY is shared between the legacy and aimdo pinning
systems, however this catch-all assumes only the legacy system exists.
Remove the catch-all as the PINNED_MEMORY buffer is coherent already.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants