Replace CUDA Runtime calls with Driver calls in libcu++ by davebayer · Pull Request #6073 · NVIDIA/cccl

davebayer · 2025-09-30T10:11:04Z

We are trying to move away from CUDA Runtime APIs to Driver APIs. This PR replaces CUDA Runtime calls with Driver calls in libcu++.

Also, I added cuInit(0) call, so we make sure we initialize the Driver.

davebayer

Notes for reviewers

davebayer · 2025-09-30T20:18:11Z


 //! @brief RAII helper which saves the current device and switches to the specified device on construction and switches
 //! to the saved device on destruction.
-using SwitchDevice = ::cuda::__ensure_current_device;


cuda::__ensure_current_device is no longer used in libcu++, so I moved the implementation to cub. It also now uses CubDebug(...) to check the CUDA Runtime calls

Let's just make sure the APIs stay the same!

davebayer · 2025-09-30T20:23:15Z

-#  else // ^^^ _CCCL_CUDA_COMPILATION() ^^^ / vvv !_CCCL_CUDA_COMPILATION() vvv
-  throw ::cuda::cuda_error(__status, __msg, __api, __loc);
-#  endif // !_CCCL_CUDA_COMPILATION()
+  NV_IF_TARGET(NV_IS_HOST, (throw ::cuda::cuda_error(__status, __msg, __api, __loc);), (::cuda::std::terminate();))


I've changed __throw_cuda_error to no longer clear the CUDA Runtime error state. It was moved to _CCCL_TRY_CUDA_API.

Not sure I like that too much but fine with me

That's because CUDA Driver calls don't set CUDA Runtime error state. And throwing cuda_error would cause clearing CUDA runtime error state that was not set by our calls + we would initialize CUDA Runtime for no reason.

pciolkosz · 2025-09-30T21:15:43Z

-    return device_ref{__id};
+    ::CUdevice __device{};
+#  if _CCCL_CTK_AT_LEAST(12, 8)
+    __device = ::cuda::__driver::__streamGetDevice(__stream);


We need a driver version check here

Aaaah, but that means I would have to implement the fallback implementation here.. I will do what you did in the past - require CUDA 13.0, we can enable it in CUDA 12 once we have implemented the checks if an API is available with the current CUDA driver

miscco · 2025-10-01T07:49:57Z

 }

-_CCCL_HOST_API inline void __eventElapsedTime(::CUevent __start, ::CUevent __end, float* __ms)
+[[nodiscard]] _CCCL_HOST_API inline float __eventElapsedTime(::CUevent __start, ::CUevent __end)


Question should this be a chrono::duration?

These wrappers should be really minimal and fast to include. I would go with float here definitely.

miscco · 2025-10-01T07:50:51Z

-#  else // ^^^ _CCCL_CUDA_COMPILATION() ^^^ / vvv !_CCCL_CUDA_COMPILATION() vvv
-  throw ::cuda::cuda_error(__status, __msg, __api, __loc);
-#  endif // !_CCCL_CUDA_COMPILATION()
+  NV_IF_TARGET(NV_IS_HOST, (throw ::cuda::cuda_error(__status, __msg, __api, __loc);), (::cuda::std::terminate();))


Not sure I like that too much but fine with me

github-actions · 2025-10-01T13:05:42Z

🥳 CI Workflow Results

🟩 Finished in 4h 24m: Pass: 100%/242 | Total: 8d 10h | Max: 4h 23m | Hits: 63%/347484

See results here.

davebayer requested a review from a team as a code owner September 30, 2025 10:11

davebayer requested a review from pciolkosz September 30, 2025 10:11

github-project-automation Bot added this to CCCL Sep 30, 2025

github-project-automation Bot moved this to Todo in CCCL Sep 30, 2025

davebayer commented Sep 30, 2025

View reviewed changes

Comment thread libcudacxx/include/cuda/__stream/stream_ref.h Outdated

davebayer requested a review from a team as a code owner September 30, 2025 13:17

This comment has been minimized.

Sign in to view

davebayer force-pushed the use_cuda_driver_apis_in_libcudacxx branch from 66f6c88 to 6a7f214 Compare September 30, 2025 20:16

davebayer requested a review from a team as a code owner September 30, 2025 20:16

davebayer requested a review from alliepiper September 30, 2025 20:16

davebayer commented Sep 30, 2025

View reviewed changes

pciolkosz approved these changes Sep 30, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

miscco reviewed Oct 1, 2025

View reviewed changes

Replace CUDA Runtime calls with Driver calls in libcu++

a5e051d

davebayer force-pushed the use_cuda_driver_apis_in_libcudacxx branch from ee6100f to a5e051d Compare October 1, 2025 08:38

davebayer requested a review from bernhardmgruber October 1, 2025 11:33

davebayer requested a review from miscco October 1, 2025 13:48

bernhardmgruber approved these changes Oct 1, 2025

View reviewed changes

github-project-automation Bot moved this from Todo to In Review in CCCL Oct 1, 2025

davebayer merged commit 06b5c8e into NVIDIA:main Oct 1, 2025
252 of 253 checks passed

github-project-automation Bot moved this from In Review to Done in CCCL Oct 1, 2025

davebayer self-assigned this Oct 1, 2025

leofang mentioned this pull request Oct 3, 2025

IPC Mempool Serialization and multiprocessing Module Support NVIDIA/cuda-python#1020

Merged

davebayer added the backport branch/3.1.x label Oct 13, 2025

Conversation

davebayer commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

davebayer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Oct 1, 2025

🥳 CI Workflow Results

🟩 Finished in 4h 24m: Pass: 100%/242 | Total: 8d 10h | Max: 4h 23m | Hits: 63%/347484

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davebayer commented Sep 30, 2025 •

edited

Loading