[PyTorch] Support delay_wgrad_compute cudagraph by buptzyb · Pull Request #1948 · NVIDIA/TransformerEngine

buptzyb · 2025-07-14T08:30:49Z

Description

Some TE modules are allowed to make delayed wgrad computation. When enabled, they will not compute wgrad in the normal forward-backward pass. Instead, wgrad is calculated when the user calls their backward_dw() method. In this PR, we support this pattern in the make_graphed_callables() API. Besides the forward and backward graphs, it will capture a new backward_dw graph. This new graph is set as an attribute to the returned graphed callable, so the user can get and execute it when needed.

The backward_dw graph will not be captured if no TE module has need_backward_dw() set to True.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Added a TransformerEngineBaseModule.need_backward_dw() method.
Added bwd_dw_graphs for the delayed wgrad computation cudagraph.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Robin Zhang <robinz@nvidia.com>

buptzyb · 2025-10-15T06:33:26Z

Hi @timmoon10 , could you help review this? Thanks! Cc @Wohox @lhb8125

lhb8125 · 2025-10-21T07:30:27Z

LGTM, thanks!

lhb8125 · 2025-10-21T07:31:00Z

/te-ci pytorch

buptzyb · 2025-10-21T12:45:12Z

Hi @timmoon10 may you help take a look or assign to someone? Thanks!

ksivaman · 2025-10-22T12:58:22Z

/te-ci pytorch

Signed-off-by: Robin Zhang <robinz@nvidia.com>

ksivaman · 2025-10-23T14:32:49Z

/te-ci pytorch

Signed-off-by: Robin Zhang <robinz@nvidia.com>

lhb8125 · 2025-10-24T02:22:25Z

/te-ci pytorch

lhb8125 · 2025-10-24T08:11:05Z

@ksivaman The CI looks good now, could you review it?

* support cudagraph dw Signed-off-by: Robin Zhang <robinz@nvidia.com> * fix lint Signed-off-by: Robin Zhang <robinz@nvidia.com> * fix ci Signed-off-by: Robin Zhang <robinz@nvidia.com> --------- Signed-off-by: Robin Zhang <robinz@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

buptzyb force-pushed the robinz/cudagraph_dw branch from 0d8ddcf to 91c3c49 Compare July 14, 2025 08:35

buptzyb force-pushed the robinz/cudagraph_dw branch from 75b6a88 to a2a3c76 Compare July 28, 2025 08:55

buptzyb force-pushed the robinz/cudagraph_dw branch from a2a3c76 to cc0cad0 Compare August 5, 2025 01:05

buptzyb force-pushed the robinz/cudagraph_dw branch 2 times, most recently from 0be616d to 601026a Compare September 2, 2025 02:04

buptzyb force-pushed the robinz/cudagraph_dw branch from 601026a to 72d7389 Compare September 8, 2025 10:42

buptzyb force-pushed the robinz/cudagraph_dw branch from 72d7389 to 045870e Compare September 28, 2025 11:07

nvMelissa added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Oct 9, 2025

support cudagraph dw

3bde7ef

Signed-off-by: Robin Zhang <robinz@nvidia.com>

buptzyb force-pushed the robinz/cudagraph_dw branch from 045870e to 3bde7ef Compare October 13, 2025 06:39

ksivaman self-requested a review October 21, 2025 13:08

ksivaman added 2 commits October 21, 2025 13:53

Merge branch 'main' into robinz/cudagraph_dw

d44e997

Merge branch 'main' into robinz/cudagraph_dw

8421bde

buptzyb and others added 2 commits October 22, 2025 06:06

fix lint

40fe1a0

Signed-off-by: Robin Zhang <robinz@nvidia.com>

Merge branch 'main' into robinz/cudagraph_dw

dc6dc16

fix ci

3e2ea81

Signed-off-by: Robin Zhang <robinz@nvidia.com>

ksivaman approved these changes Oct 24, 2025

View reviewed changes

ksivaman merged commit 6273ced into NVIDIA:main Oct 24, 2025
11 of 13 checks passed

This was referenced Nov 7, 2025

[Dev] Partial CUDA Graph support for EP Overlap NVIDIA/Megatron-LM#2167

Closed

[Dev] Partial CUDA Graph support for EP Overlap NVIDIA/Megatron-LM#2168

Merged

[Main] Partial CUDA Graph support for EP Overlap NVIDIA/Megatron-LM#2184

Merged

Wohox mentioned this pull request Nov 13, 2025

[Pytorch] Fix backward_dw cuda graph order #2376

Merged

13 tasks

Wohox mentioned this pull request Jan 5, 2026

[Dev](Reapply) Partial CUDA Graph support for EP Overlap NVIDIA/Megatron-LM#2810

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Support delay_wgrad_compute cudagraph#1948

[PyTorch] Support delay_wgrad_compute cudagraph#1948
ksivaman merged 6 commits intoNVIDIA:mainfrom
buptzyb:robinz/cudagraph_dw

buptzyb commented Jul 14, 2025

Uh oh!

buptzyb commented Oct 15, 2025

Uh oh!

lhb8125 commented Oct 21, 2025

Uh oh!

lhb8125 commented Oct 21, 2025

Uh oh!

buptzyb commented Oct 21, 2025

Uh oh!

ksivaman commented Oct 22, 2025

Uh oh!

ksivaman commented Oct 23, 2025

Uh oh!

lhb8125 commented Oct 24, 2025

Uh oh!

lhb8125 commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

buptzyb commented Jul 14, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

buptzyb commented Oct 15, 2025

Uh oh!

lhb8125 commented Oct 21, 2025

Uh oh!

lhb8125 commented Oct 21, 2025

Uh oh!

buptzyb commented Oct 21, 2025

Uh oh!

ksivaman commented Oct 22, 2025

Uh oh!

ksivaman commented Oct 23, 2025

Uh oh!

lhb8125 commented Oct 24, 2025

Uh oh!

lhb8125 commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants