bazel run //:test
Starting local Bazel server (8.2.1) and connecting to it...
INFO: Analyzed target //:test (138 packages loaded, 9986 targets configured).
INFO: Found 1 target...
Target //:test up-to-date:
bazel-bin/test
INFO: Elapsed time: 34.576s, Critical Path: 23.99s
INFO: 22 processes: 63 action cache hit, 22 internal.
INFO: Build completed successfully, 22 total actions
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh ./test
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:test
-----------------------------------------------------------------------------
ERROR:2025-09-01 19:24:47,302:jax._src.xla_bridge:487: Jax plugin configuration error: Exception when calling jax_plugins.xla_cuda12.initialize()
Traceback (most recent call last):
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax_plugins/xla_cuda12/__init__.py", line 201, in _version_check
version = get_version()
^^^^^^^^^^^^^
RuntimeError: jaxlib/cuda/versions_helpers.cc:81: operation cusparseGetProperty(MAJOR_VERSION, &major) failed: The cuSPARSE library was not found.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 485, in discover_pjrt_plugins
plugin_module.initialize()
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax_plugins/xla_cuda12/__init__.py", line 328, in initialize
_check_cuda_versions(raise_on_first_error=True)
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax_plugins/xla_cuda12/__init__.py", line 266, in _check_cuda_versions
_version_check("cuSPARSE", cuda_versions.cusparse_get_version,
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax_plugins/xla_cuda12/__init__.py", line 205, in _version_check
raise RuntimeError(err_msg) from e
RuntimeError: Unable to load cuSPARSE. Is it installed?
WARNING:2025-09-01 19:24:47,311:jax._src.xla_bridge:864: An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
Hello, nvidia=<module 'nvidia' (namespace) from ['/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/nvidia']>!
Hello, jax=<module 'jax' from '/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax/__init__.py'>!
mujoco=<module 'mujoco' from '/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/mujoco/__init__.py'>!
Traceback (most recent call last):
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test_stage2_bootstrap.py", line 474, in <module>
main()
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test_stage2_bootstrap.py", line 468, in main
_run_py_path(main_filename, args=sys.argv[1:])
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test_stage2_bootstrap.py", line 284, in _run_py_path
runpy.run_path(main_filename, run_name="__main__")
File "<frozen runpy>", line 287, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/test.py", line 20, in <module>
main()
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/test.py", line 16, in main
print(f"{jax.devices('gpu')=}")
^^^^^^^^^^^^^^^^^^
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 1010, in devices
return get_backend(backend).devices()
^^^^^^^^^^^^^^^^^^^^
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 944, in get_backend
return _get_backend_uncached(platform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 925, in _get_backend_uncached
platform = canonicalize_platform(platform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.cache/bazel/_bazel_user/78efd52eda40769e82a38d778950bc83/execroot/_main/bazel-out/k8-fastbuild/bin/test.runfiles/_main/_test.venv/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 728, in canonicalize_platform
raise RuntimeError(f"Unknown backend: '{platform}' requested, but no "
RuntimeError: Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present. Platforms present are: cpu
🐞 bug report
Affected Rule
n/a
Is this a regression?
No. Indeed, there are several similar issues already open. See e.g. #2156, pytorch/pytorch#117350, pytorch/pytorch#101314. I also know that the venv/site-packages layout is still experimental and thus this is not really a bug. I still wanted to share the information in case someone finds it useful (and also to provide a log for myself to come back to this next time I have time to look into it).
Description
I tried my luck in installing the
nvidiaCUDA libraries (nvidia-cuda-runtime-cu12 and friends) to see if I could get them to work out of the box with the new venv/site-packages layout (#2156). The TLDR is: no, they didn't work out of the box. I've added some data from my initial investigation below.🔬 Minimal Reproduction
e2d73e2
🔥 Exception or Error
Details
🌍 Your Environment
Operating System:
Output of
bazel version:Rules_python version:
4a422b0
Additional Information
I first diffed
uv venv-generated site-packages against apy_binarytarget built withvenvs_site_packages=yes. The differences are minor, with just missingRECORDfiles and differentINSTALLERfiles from the bazel environment. TheINSTALLERfile diffs are expected. The missingRECORDfiles I'm not sure of. They are missing from all the following packages:Details
Here's an example diff for one of them,
nvidia_cublas_cu12-12.9.1.4.dist-info.diff:Details
The
nvidiapackage, which is holding the actual implementation files, is identical betweenuv venvandrules_pythonvenv, modulo the files being symlinked in the latter.Despite the near-identical site-packages structure, the
py_binarytarget is unable to load the sharednvidialibraries out of the box. After poking around a bit withstrace, I see thatlibcusparse.so.12exists but loading it fails because its transitive dependency oflibnvJitLink.so.12cannot be found1. My guess is that this failure is caused by the dynamic linker resolving paths from the real file locations, which are different from the symlinked directory structure in Bazel-generated site-packages, breaking the run path lookups inside the C libraries.As pointed out by @aignas (https://bazelbuild.slack.com/archives/CA306CEV6/p1756127831137709?thread_ts=1756088512.734139&cid=CA306CEV6), it would be interesting to debug this by copying the files instead of symlinking them to verify if it's really the symlinking that breaks things.
Questions and Things to Do
rules_pythonside? Or should thenvidiapackages do something different to make the linker lookups work with symlinks?Workaround for Jax
For what it's worth, I'm using Jax, not Pytorch as most others seem to do. I couldn't find any existing examples of patching Jax to preload the packages. Here's how I got around this issue:
Details
Footnotes
This seems like a common issue and can be worked around by preloading the packages. See e.g. https://gist.github.com/qxcv/183c2d6cd81f7028b802b232d6a9dd62, https://github.com/pytorch/pytorch/pull/137059. What I'm trying to understand here is how to get these work out-of-the-box without having to patch the packages. ↩