Today, LaunchConfig only supports cuLaunchKernel driver API to launch kernels on a single GPU. When extending to broader usecases where there is a need for inter-SM synchronization or multi-GPU synchronization, one would need to use cuLaunchCooperativeKernel to launch kernels safely in a deadlock-free manner. To support this, one could extend LaunchConfig(..., launch_attr=None) with an optional launch_attr that could set equivalent cuda-python data-type for CUlaunchAttribute.
Background:
This issue came out of discussion: NVIDIA/numba-cuda#128 (comment) where existing implementation of cuda driver bindings in numba-cuda uses cuLaunchCooperativeKernel or cuLaunchKernel based on the existence of grid.sync() in the kernel and in the effort to migrate it to cuda.core, one would need to provide the capability to select launch kernel API variant at runtime based on the LaunchConfig.
Today,
LaunchConfigonly supportscuLaunchKerneldriver API to launch kernels on a single GPU. When extending to broader usecases where there is a need for inter-SM synchronization or multi-GPU synchronization, one would need to usecuLaunchCooperativeKernelto launch kernels safely in a deadlock-free manner. To support this, one could extendLaunchConfig(..., launch_attr=None)with an optionallaunch_attrthat could set equivalent cuda-python data-type forCUlaunchAttribute.Background:
This issue came out of discussion: NVIDIA/numba-cuda#128 (comment) where existing implementation of cuda driver bindings in
numba-cudausescuLaunchCooperativeKernelorcuLaunchKernelbased on the existence ofgrid.sync()in the kernel and in the effort to migrate it tocuda.core, one would need to provide the capability to select launch kernel API variant at runtime based on theLaunchConfig.