Skip to content

Fix is_managed reporting for pool-allocated managed memory#1924

Merged
cpcloud merged 4 commits intoNVIDIA:mainfrom
cpcloud:worktree-linked-splashing-bonbon
Apr 16, 2026
Merged

Fix is_managed reporting for pool-allocated managed memory#1924
cpcloud merged 4 commits intoNVIDIA:mainfrom
cpcloud:worktree-linked-splashing-bonbon

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 16, 2026

Summary

  • Buffer.is_managed now returns True when either the driver pointer attribute says so or the owning memory resource advertises managed allocations. The driver signal takes precedence; the resource signal is only a fallback.
  • Expose is_managed on the MemoryResource base (default False); ManagedMemoryResource overrides it to True. Other subclasses inherit False.

Why

ManagedMemoryResource allocates via cuMemAllocFromPoolAsync from a pool created with CU_MEM_ALLOCATION_TYPE_MANAGED. On some CUDA driver / hardware combinations, cuPointerGetAttributes on those allocations returns IS_MANAGED=0 and MEMORY_TYPE=CU_MEMORYTYPE_HOST. _query_memory_attrs therefore set is_device_accessible=True, is_host_accessible=True, is_managed=False, and classify_dl_device returned kDLCUDAHost (3).

CCCL's make_tma_descriptor (libcudacxx/include/cuda/__tma/make_tma_descriptor.h) accepts only kDLCUDA or kDLCUDAManaged, so StridedMemoryView.as_tensor_map() failed on a ManagedMemoryResource buffer with:

ValueError: Failed to build TMA descriptor via CCCL: Device type must be kDLCUDA or kDLCUDAManaged

Surfaced in TestTensorMapMultiDeviceValidation::test_from_tiled_accepts_managed_buffer_on_nonzero_device on NVIDIA B300 SXM6 AC.

Caveat on the driver behavior

Reproducing the exact pre-fix cuPointerGetAttributes values on RTX 5070 Ti / driver 13.2.0 shows IS_MANAGED=1 and MEMORY_TYPE=DEVICE for both cuMemAllocManaged and cuMemAllocFromPoolAsync from a managed pool — i.e. this configuration does not hit the bug. The fix is still sound: it is a no-op when the driver attributes are reported correctly, and it closes the gap when they aren't, without relying on driver-side quirks. The precise driver / CTK / hw combination that triggers the kDLCUDAHost classification on B300 is not reproduced in this PR; the failing test in the description comes from the reporter's B300 environment.

🤖 Generated with Claude Code

Pool-allocated managed memory via cuMemAllocFromPoolAsync (from a pool
created with CU_MEM_ALLOCATION_TYPE_MANAGED) does not set
CU_POINTER_ATTRIBUTE_IS_MANAGED=1. _query_memory_attrs therefore
classified the allocation as pinned host memory, causing
classify_dl_device to return kDLCUDAHost instead of kDLCUDAManaged.
CCCL's make_tma_descriptor only accepts kDLCUDA or kDLCUDAManaged, so
as_tensor_map() failed with "Device type must be kDLCUDA or
kDLCUDAManaged" on managed buffers.

Buffer.is_device_accessible / is_host_accessible already delegate to
the memory resource when one is attached. Apply the same pattern to
is_managed, and expose is_managed on the MemoryResource base
(defaulting to False) with ManagedMemoryResource overriding it to
True.

Also ignore .claude/settings.local.json in .gitignore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud added bug Something isn't working cuda.core Everything related to the cuda.core module labels Apr 16, 2026
@cpcloud cpcloud self-assigned this Apr 16, 2026
@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 16, 2026
@github-actions

This comment has been minimized.

cpcloud and others added 3 commits April 16, 2026 06:08
The existing test_managed_buffer_dlpack_roundtrip_device_type uses a
DummyUnifiedMemoryResource backed by cuMemAllocManaged, which sets
CU_POINTER_ATTRIBUTE_IS_MANAGED and so never exercised the pool-allocated
path that surfaced the bug.

Add two targeted tests:

- test_managed_memory_resource_buffer_dlpack_device_type: allocates from
  ManagedMemoryResource (cuMemAllocFromPoolAsync on a managed pool) and
  asserts is_managed and kDLCUDAManaged through Buffer and view.
- test_non_managed_resources_report_not_managed: parametrized smoke test
  ensuring DeviceMemoryResource and PinnedMemoryResource still report
  is_managed=False so the new MemoryResource.is_managed default does not
  silently misclassify non-managed resources.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous fix unconditionally delegated Buffer.is_managed to
_memory_resource.is_managed, which returns False for any
MemoryResource subclass that does not opt in.  That broke
DummyUnifiedMemoryResource (and any user-defined MR wrapping
cuMemAllocManaged) where the driver pointer attribute correctly
reports IS_MANAGED=1 but the resource does not override is_managed.

Query the driver first; only fall back to the memory resource when
the driver does not report IS_MANAGED (the pool-allocated managed
memory path).  This keeps both old-style cuMemAllocManaged buffers
and ManagedMemoryResource pool allocations correctly classified.

Also rework the regression test parametrization to skip the pinned
case when PinnedMemoryResource is unavailable (CUDA < 13.0), and pick
up the ruff-format reflow of the helper call site.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pick up cuda-nvrtc 13.2.78, libcufile 1.17.1.22, and other transitive
package updates from conda-forge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the Needs-Restricted-Paths-Review PR touches cuda_bindings or cuda_python; only NVIDIA employees may modify these paths; see LICENSEs label Apr 16, 2026
@cpcloud cpcloud enabled auto-merge (squash) April 16, 2026 12:32
@cpcloud cpcloud requested a review from rparolin April 16, 2026 16:48
@cpcloud cpcloud merged commit cb3c132 into NVIDIA:main Apr 16, 2026
173 of 177 checks passed
@cpcloud cpcloud deleted the worktree-linked-splashing-bonbon branch April 16, 2026 17:25
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

@rwgk rwgk removed the Needs-Restricted-Paths-Review PR touches cuda_bindings or cuda_python; only NVIDIA employees may modify these paths; see LICENSEs label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants