RFC: Add support for `launch_attr` in `LaunchConfig` ctor

Today, `LaunchConfig` only supports `cuLaunchKernel` driver API to launch kernels on a single GPU. When extending to broader usecases where there is a need for inter-SM synchronization or multi-GPU synchronization, one would need to use `cuLaunchCooperativeKernel` to launch kernels safely in a deadlock-free manner. To support this, one could extend `LaunchConfig(..., launch_attr=None)` with an optional `launch_attr` that could set equivalent cuda-python data-type for [`CUlaunchAttribute`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#group__CUDA__TYPES_1g6f6565b334be6bb3134868e10bbdd331).


**Background:**
This issue came out of discussion: https://github.com/NVIDIA/numba-cuda/issues/128#issuecomment-2702689412 where existing implementation of cuda driver bindings in `numba-cuda` uses `cuLaunchCooperativeKernel` or `cuLaunchKernel` based on the existence of `grid.sync()` in the kernel and in the effort to migrate it to `cuda.core`, one would need to provide the capability to select launch kernel API variant at runtime based on the `LaunchConfig`. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add support for `launch_attr` in `LaunchConfig` ctor #496

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Add support for launch_attr in LaunchConfig ctor #496

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

RFC: Add support for `launch_attr` in `LaunchConfig` ctor #496