Support qwen3-vl for THD format and CP by wplf · Pull Request #1943 · NVIDIA-NeMo/Megatron-Bridge

wplf · 2026-01-14T12:03:56Z

What does this PR do ?

We will adopt @ISEEKYAN's work for qwen3-vl from mbridge into megatron-bridge, incorporating his additions such as THD format and CP support.

For now, THD format and BSHD format training is ready.

bshd example script

python -m torch.distributed.run --nproc_per_node=8 \                                                                                                                                                                                                                                                                                           users/jinliangl cw-dfw-cs-001-vscode-01
    finetune_qwen_vl.py \
    --dataset-type hf \
    --data-path llava_video_178k \
    --recipe qwen3_vl_30b_a3b_finetune_config \
    --config-file conf/qwen3_vl_30b_a3b_pretrain_mfsdp_override_example.yaml dataset.pack_sequences_in_batch=false

thd example script

cd $HOME2/repos/Megatron-Bridge/examples/recipes/qwen_vl
python -m torch.distributed.run --nproc_per_node=8 \
    finetune_qwen_vl.py \
    --dataset-type hf \
    --data-path llava_video_178k \
    --recipe qwen3_vl_8b_finetune_config  \
    --config-file conf/qwen3_vl_pretrain_override_example.yaml dataset.pack_sequences_in_batch=false

Model forward Validation

The output from Megatron-Bridge's Qwen3VL is now bitwise identical to that of M-Bridge's Qwen3VL.

MOE model

Dense model

Remain to do

Summary by CodeRabbit

Release Notes

New Features
- Enhanced vision-language model support with improved architecture for multimodal processing
- Added profiling capabilities including memory snapshots and performance tracking
- Extended logging and monitoring options for training visibility
Configuration Updates
- Updated example configurations with expanded training parameters, optimization settings, and vision model options

Signed-off-by: jinliangl <jinliangl@nvidia.com>

copy-pr-bot · 2026-01-14T12:03:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

src/megatron/bridge/models/qwen_vl/qwen3_vl_provider.py

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py

Signed-off-by: jinliangl <jinliangl@nvidia.com>

…rgency has not been test yet.

…ck's param, need to check it further.

src/megatron/bridge/training/vlm_step.py

cuichenx

Sorry, accidentally clicked approve. Please check my comments above

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_config.py

Signed-off-by: jinliangl <jinliangl@nvidia.com>

…ision_model=true to enable it Signed-off-by: jinliangl <jinliangl@nvidia.com>

Signed-off-by: jinliangl <jinliangl@nvidia.com>

yaoyu-33 · 2026-02-09T17:07:50Z

/ok to test 1cfa2dc

wplf · 2026-02-10T02:10:50Z

/ok to test 1cfa2dc

wplf · 2026-02-10T02:14:14Z

/ok to test e18e64d

shifangx · 2026-02-10T03:32:11Z

/ok to test e18e64d

shifangx · 2026-02-10T11:39:10Z

/ok to test 3eb6ebc

wplf · 2026-02-10T14:44:00Z

/ok to test 3eb6ebc

ko3n1g

Tests will be submitted soon in a followup

qwen3-vl migration [wip]

cda72a3

Signed-off-by: jinliangl <jinliangl@nvidia.com>

wplf marked this pull request as draft January 14, 2026 12:04

wplf changed the title ~~qwen3-vl migration [wip]~~ Support qwen3-vl for THD format and CP [wip] Jan 14, 2026

yaoyu-33 reviewed Jan 14, 2026

View reviewed changes

src/megatron/bridge/models/qwen_vl/qwen3_vl_provider.py Outdated Show resolved Hide resolved

yaoyu-33 reviewed Jan 14, 2026

View reviewed changes

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py Show resolved Hide resolved

yaoyu-33 reviewed Jan 14, 2026

View reviewed changes

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py Show resolved Hide resolved

yaoyu-33 reviewed Jan 14, 2026

View reviewed changes

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py Outdated Show resolved Hide resolved

support qwen3vl bshd training

63db4de

Signed-off-by: jinliangl <jinliangl@nvidia.com>

wplf changed the title ~~Support qwen3-vl for THD format and CP [wip]~~ [Draft] support qwen3-vl for THD format and CP Jan 15, 2026

wplf added 4 commits January 19, 2026 14:45

support bshd training, thd training, thd training with cp, loss conve…

4d61e8c

…rgency has not been test yet.

adjust how to get packed seq params

5a9ecf4

supoort qwen3vl dense

7c12b6b

support convert hf ckpt for qwenvl vision module, can convert deepsta…

3cc62c3

…ck's param, need to check it further.

HollowMan6 mentioned this pull request Jan 20, 2026

Qwen 3 VL 30b moe training fails when save checkpoint on megatron verl-project/verl#4990

Closed

wplf added 3 commits January 21, 2026 18:49

fix qwen3vl shard_state_dict

4c32826

align megagtron-bridge and mbridge fwd bitwisely

14caee7

align bshd and thd training, cp training remains to do

35e218a

wplf marked this pull request as ready for review January 22, 2026 10:27

wplf changed the title ~~[Draft] support qwen3-vl for THD format and CP~~ Support qwen3-vl for THD format and CP Jan 22, 2026

shifangx reviewed Jan 22, 2026

View reviewed changes

src/megatron/bridge/training/vlm_step.py Outdated Show resolved Hide resolved

shifangx reviewed Jan 22, 2026

View reviewed changes

src/megatron/bridge/training/vlm_step.py Outdated Show resolved Hide resolved

cuichenx approved these changes Jan 22, 2026

View reviewed changes

src/megatron/bridge/training/vlm_step.py Outdated Show resolved Hide resolved

src/megatron/bridge/training/vlm_step.py Outdated Show resolved Hide resolved

cuichenx self-requested a review January 22, 2026 23:04

cuichenx requested changes Jan 22, 2026

View reviewed changes

shifangx reviewed Jan 23, 2026

View reviewed changes

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_config.py Show resolved Hide resolved

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_config.py Outdated Show resolved Hide resolved

wplf added 4 commits January 23, 2026 14:43

fix full recompute of qwen3vl bug with credit to xuwen

c187879

Signed-off-by: jinliangl <jinliangl@nvidia.com>

Merge branch 'main' into jinliang/qwen3-vl

aea4235

support hf vision model and megatron vision model; use model.use_hf_v…

5c4dcad

…ision_model=true to enable it Signed-off-by: jinliangl <jinliangl@nvidia.com>

align with pr 1997, thd and bshd loss curve is verified

da810ab

Signed-off-by: jinliangl <jinliangl@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 9, 2026 10:22 Inactive

fix ci break on qwen3vl conversion

8ae2c86

wplf force-pushed the jinliang/qwen3-vl branch from a3b8dda to 8ae2c86 Compare February 9, 2026 10:45

copy-pr-bot bot temporarily deployed to nemo-ci February 9, 2026 17:10 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 03:32 Inactive

shifangx previously approved these changes Feb 10, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to test February 10, 2026 03:32 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 04:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 04:33 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 04:43 Inactive

wplf and others added 2 commits February 10, 2026 15:03

remove useless and dead code

652bc8d

Merge branch 'main' into jinliang/qwen3-vl

3eb6ebc

ko3n1g approved these changes Feb 10, 2026

View reviewed changes

This was referenced Feb 11, 2026

add qwen3vl ut and delete useless comment #2325

Merged

Adding CUDA Graph Support for Vision Encoder #2334

Open

support qwen3-omni-moe #2342

Open

support for training qwen3 vl with dist train #2367

Open

shifangx mentioned this pull request Feb 13, 2026

M4 leftover for QWen3-VL with MCore vision encoder #2370

Merged

5 tasks

shifangx mentioned this pull request Feb 26, 2026

[draft]Add Qwen3-VL support with Megatron-FSDP #1801

Closed

5 tasks

coderabbitai bot mentioned this pull request Mar 4, 2026

add qwen2_5_omni #2634

Merged

5 tasks

sbhavani mentioned this pull request Mar 23, 2026

[ROADMAP] Megatron Core Roadmap NVIDIA/Megatron-LM#4003

Open

Conversation

wplf commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Model forward Validation

MOE model

Dense model

Remain to do

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Jan 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Feb 9, 2026

Uh oh!

wplf commented Feb 10, 2026

Uh oh!

wplf commented Feb 10, 2026

Uh oh!

shifangx commented Feb 10, 2026

Uh oh!

shifangx commented Feb 10, 2026

Uh oh!

wplf commented Feb 10, 2026

Uh oh!

ko3n1g left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wplf commented Jan 14, 2026 •

edited

Loading