-
Notifications
You must be signed in to change notification settings - Fork 275
nvbug6084457: Fix device architecture handling and NVLink link count query #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,4 +26,4 @@ def test_nvlink_get_link_count(all_devices): | |
| # The feature_nvlink_supported detection is not robust, so we | ||
| # can't be more specific about how many links we should find. | ||
| if value.nvml_return == nvml.Return.SUCCESS: | ||
| assert value.value.ui_val <= nvml.NVLINK_MAX_LINKS, f"Unexpected link count {value.value.ui_val}" | ||
| assert value.value.ui_val[0] <= nvml.NVLINK_MAX_LINKS, f"Unexpected link count {value.value.ui_val[0]}" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM, based on my understanding that this fixes a subtle latent bug in the test: before, |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -165,7 +165,11 @@ cdef class Device: | |
| "VOLTA"``, and RTX A6000 will report ``DeviceArchitecture.name == | ||
| "AMPERE"``. | ||
| """ | ||
| return DeviceArch(nvml.device_get_architecture(self._handle)) | ||
| arch = nvml.device_get_architecture(self._handle) | ||
| try: | ||
| return DeviceArch(arch) | ||
| except ValueError: | ||
| return nvml.DeviceArch.UNKNOWN | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM, based on my understanding that How easy or difficult would it be to add a test that covers the |
||
|
|
||
| @property | ||
| def name(self) -> str: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability nit, to make the intent more obvious:
Nit 2:
UNKNOWN_ARCH_ID, so the resulting warning explains what the magic integer is (it can be guessed, this is just a little more helpful).