Draft
Conversation
Contributor
|
@nksauter , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ? |
Contributor
Author
|
@dermen stand by please with regard to the working tests. I'm going to
apply the test to additional cases today to double check. The duplicated
code is a serious problem, as my original intent was to implement
polymorphism with the minimal lines of code, instead I had to duplicate an
entire kernel. Further, I was unable to shrink the size of the
m_accumulate_floatimage array as was the original intent.
…On Thu, May 9, 2024 at 8:53 AM Derek Mendez ***@***.***> wrote:
@nksauter <https://github.com/nksauter> , just making sure I see where we
are: in the latest commit, the test works, but there is duplicated code
which we want to avoid ?
—
Reply to this email directly, view it on GitHub
<#983 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ24VTVVOZLU4RA3OPDZETZBOLXNAVCNFSM6AAAAABF7NPLTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSHE2DAMJXGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Baharis
approved these changes
May 31, 2024
Contributor
Baharis
left a comment
There was a problem hiding this comment.
After 3 weeks of testing via psii_spread's annulus worker, this PR was indeed found to significantly lower the memory footprint when simulating images on GPU. I don't have particular issues with the functionality of this code.
edb7ebb to
7e5ae72
Compare
In the exascale_api, allow pixel values to be calculation either on large array (all pixels), or with low-memory on just the whitelist consisting of shoebox pixels. This commit only gives the polymorphism framework; both implementations are currently identical giving the large-array behavior.
The script tests/tst_memory_policy.py fails with a cuda illegal access. The intention is to get help from NESAP to get a functional test.
Memory savings achieved through code specialization, for the case where pixel values are simulated on a small whitelist. Specializations are not yet optimal, as there is still a lot of code duplication. Changes give ~4.5x reduction in memory footprint, but no success yet in resizing the array m_accumulate_floatimage. Attempts so far lead to cuda memory allocation error.
7e5ae72 to
fa0295b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've created a failing unit test for new kokkos code in the cuda-context.
If the test "libtbx.python cctbx_project/simtbx/tests/tst_memory_policy.py" can be made to work, the bug has been fixed.
Background information: I'm trying to extend the kokkos exascale_api with new behavior. In the old way, it would allocate large arrays on GPU corresponding to the detector size even if only a few pixels are calculated due to the whitelist (a list of pixels of interest). In the new behavior only enough memory is allocated for the whitelist pixels. I am using C++ templates with template specialization dependent on these two memory-allocation cases. I need this to work so Daniel can move ahead with the SPREAD project, and it is just this last detail that I cannot seem to fix.