Replace CUDA Runtime calls with Driver calls in libcu++#6073
Replace CUDA Runtime calls with Driver calls in libcu++#6073davebayer merged 1 commit intoNVIDIA:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
66f6c88 to
6a7f214
Compare
davebayer
left a comment
There was a problem hiding this comment.
Notes for reviewers
|
|
||
| //! @brief RAII helper which saves the current device and switches to the specified device on construction and switches | ||
| //! to the saved device on destruction. | ||
| using SwitchDevice = ::cuda::__ensure_current_device; |
There was a problem hiding this comment.
cuda::__ensure_current_device is no longer used in libcu++, so I moved the implementation to cub. It also now uses CubDebug(...) to check the CUDA Runtime calls
There was a problem hiding this comment.
Let's just make sure the APIs stay the same!
| # else // ^^^ _CCCL_CUDA_COMPILATION() ^^^ / vvv !_CCCL_CUDA_COMPILATION() vvv | ||
| throw ::cuda::cuda_error(__status, __msg, __api, __loc); | ||
| # endif // !_CCCL_CUDA_COMPILATION() | ||
| NV_IF_TARGET(NV_IS_HOST, (throw ::cuda::cuda_error(__status, __msg, __api, __loc);), (::cuda::std::terminate();)) |
There was a problem hiding this comment.
I've changed __throw_cuda_error to no longer clear the CUDA Runtime error state. It was moved to _CCCL_TRY_CUDA_API.
There was a problem hiding this comment.
Not sure I like that too much but fine with me
There was a problem hiding this comment.
That's because CUDA Driver calls don't set CUDA Runtime error state. And throwing cuda_error would cause clearing CUDA runtime error state that was not set by our calls + we would initialize CUDA Runtime for no reason.
| return device_ref{__id}; | ||
| ::CUdevice __device{}; | ||
| # if _CCCL_CTK_AT_LEAST(12, 8) | ||
| __device = ::cuda::__driver::__streamGetDevice(__stream); |
There was a problem hiding this comment.
We need a driver version check here
There was a problem hiding this comment.
Aaaah, but that means I would have to implement the fallback implementation here.. I will do what you did in the past - require CUDA 13.0, we can enable it in CUDA 12 once we have implemented the checks if an API is available with the current CUDA driver
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| _CCCL_HOST_API inline void __eventElapsedTime(::CUevent __start, ::CUevent __end, float* __ms) | ||
| [[nodiscard]] _CCCL_HOST_API inline float __eventElapsedTime(::CUevent __start, ::CUevent __end) |
There was a problem hiding this comment.
Question should this be a chrono::duration?
There was a problem hiding this comment.
These wrappers should be really minimal and fast to include. I would go with float here definitely.
| # else // ^^^ _CCCL_CUDA_COMPILATION() ^^^ / vvv !_CCCL_CUDA_COMPILATION() vvv | ||
| throw ::cuda::cuda_error(__status, __msg, __api, __loc); | ||
| # endif // !_CCCL_CUDA_COMPILATION() | ||
| NV_IF_TARGET(NV_IS_HOST, (throw ::cuda::cuda_error(__status, __msg, __api, __loc);), (::cuda::std::terminate();)) |
There was a problem hiding this comment.
Not sure I like that too much but fine with me
ee6100f to
a5e051d
Compare
🥳 CI Workflow Results🟩 Finished in 4h 24m: Pass: 100%/242 | Total: 8d 10h | Max: 4h 23m | Hits: 63%/347484See results here. |
We are trying to move away from CUDA Runtime APIs to Driver APIs. This PR replaces CUDA Runtime calls with Driver calls in libcu++.
Also, I added
cuInit(0)call, so we make sure we initialize the Driver.