Skip to content

[DRAFT][TTS] Magpietts Simple API and loading audiocodec from Huggingface#15172

Merged
subhankar-ghosh merged 63 commits intomainfrom
magpietts_opensource_longform
Dec 17, 2025
Merged

[DRAFT][TTS] Magpietts Simple API and loading audiocodec from Huggingface#15172
subhankar-ghosh merged 63 commits intomainfrom
magpietts_opensource_longform

Conversation

@subhankar-ghosh
Copy link
Copy Markdown
Collaborator

@subhankar-ghosh subhankar-ghosh commented Dec 10, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Refactored prepare_context_tensor method
  • Magpietts Simple API do_tts(transcript, language, apply_TN)
  • loading audiocodec from Huggingface
  • Enable text_tokenizer files to be registered as artifacts, so that they can be saved in the .nemo checkpoint file instead of loading from external files.
  • Move eval_config.json from nemo/collections/tts/modules/magpietts_inference to examples/tts
  • eval_config.json is now just an example - which contains the dataset for CICD.
  • Changed input to inference -> Now instead of taking in dataset names, the input will be a json file which contains the dataset metadata for the datasets you want to run inference on. It will run inference (and eval) on all datasets in the .json file.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

subhankar-ghosh and others added 23 commits December 7, 2025 01:19
… example

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Removed multiple long manifest configurations from evalset_config.py.

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
@github-actions github-actions bot added the TTS label Dec 10, 2025
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
subhankar-ghosh and others added 3 commits December 10, 2025 11:19
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Base automatically changed from magpietts_opensource to main December 13, 2025 17:39
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
@subhankar-ghosh
Copy link
Copy Markdown
Collaborator Author

fixed the UTMOS requirements and the json in magpietts_inference. Changed to a better way of using has_text_context in magpietts.py.

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Copy link
Copy Markdown
Collaborator

@rfejgin rfejgin Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a good time to remove feature_dir, I believe we don't need that anymore (@paarthneekhara could you confirm?) and it would simplify both the JSON file and the code. But we'd have to remove it in a few places in the code.

Copy link
Copy Markdown
Collaborator

@rfejgin rfejgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major comments on my end, but see one inline about the format of evalset_config.json

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
blisc
blisc previously approved these changes Dec 16, 2025
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
@github-actions github-actions bot removed the Run CICD label Dec 17, 2025
@github-actions
Copy link
Copy Markdown
Contributor

[🤖]: Hi @subhankar-ghosh 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@subhankar-ghosh subhankar-ghosh merged commit 676a368 into main Dec 17, 2025
259 checks passed
@subhankar-ghosh subhankar-ghosh deleted the magpietts_opensource_longform branch December 17, 2025 15:26
AkCodes23 pushed a commit to AkCodes23/NeMo that referenced this pull request Jan 28, 2026
…face (NVIDIA-NeMo#15172)

* Modularize magpie inference code, move inference code from scripts to example

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Modify magpie CI with inference changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Renaming magpietts inference scripts from magpie to magpietts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* infer_batch returns dataclass object

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Fixed context embedding without context encoder

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Remove unnecessary configurations

Removed multiple long manifest configurations from evalset_config.py.

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>

* Removing unused imports

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Copilot suggested changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Copilot suggested changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* do_tts method, load audiocodec from huggingface

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Move inference helper modules from examples to tts collection

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Review changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Changes suggested in compute_mean_with_confidence_interval

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Linting issue

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* do_tts method, load audiocodec from huggingface

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* register_tokenizer_artifacts to store tokenizer files in .nemo file

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Modularize magpie inference code, move inference code from scripts to example

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Renaming magpietts inference scripts from magpie to magpietts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Removing unused imports

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Remove unnecessary configurations

Removed multiple long manifest configurations from evalset_config.py.

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>

* Copilot suggested changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Move inference helper modules from examples to tts collection

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Changes suggested in compute_mean_with_confidence_interval

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* do_tts method, load audiocodec from huggingface

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* register_tokenizer_artifacts to store tokenizer files in .nemo file

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* rebase with main issues

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* changed datasets to json input, moved json file to examples/tts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Remove unwanted dataconfig.

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* optional utmos import, text_normalization cache and check, test updated

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Update nemo/collections/tts/models/magpietts.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Update nemo/collections/tts/models/magpietts.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

* Linting errors

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Refactored prepare_context_tensors, removed dummy context audio/text from do_tts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* remove utmos, make dataset path required

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* remove unused imports

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Enable loading MagpieTTS from HF

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Support speaker index in do_tts api

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

---------

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>
Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>
Co-authored-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
nune-tadevosyan pushed a commit to nune-tadevosyan/NeMo that referenced this pull request Mar 13, 2026
…face (NVIDIA-NeMo#15172)

* Modularize magpie inference code, move inference code from scripts to example

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Modify magpie CI with inference changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Renaming magpietts inference scripts from magpie to magpietts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* infer_batch returns dataclass object

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Fixed context embedding without context encoder

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Remove unnecessary configurations

Removed multiple long manifest configurations from evalset_config.py.

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>

* Removing unused imports

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Copilot suggested changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Copilot suggested changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* do_tts method, load audiocodec from huggingface

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Move inference helper modules from examples to tts collection

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Review changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Changes suggested in compute_mean_with_confidence_interval

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Linting issue

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* do_tts method, load audiocodec from huggingface

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* register_tokenizer_artifacts to store tokenizer files in .nemo file

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Modularize magpie inference code, move inference code from scripts to example

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Renaming magpietts inference scripts from magpie to magpietts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Removing unused imports

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Remove unnecessary configurations

Removed multiple long manifest configurations from evalset_config.py.

Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>

* Copilot suggested changes

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Move inference helper modules from examples to tts collection

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Changes suggested in compute_mean_with_confidence_interval

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* do_tts method, load audiocodec from huggingface

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* register_tokenizer_artifacts to store tokenizer files in .nemo file

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* rebase with main issues

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* changed datasets to json input, moved json file to examples/tts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Remove unwanted dataconfig.

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* optional utmos import, text_normalization cache and check, test updated

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Update nemo/collections/tts/models/magpietts.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* Update nemo/collections/tts/models/magpietts.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>

* Linting errors

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Refactored prepare_context_tensors, removed dummy context audio/text from do_tts

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Apply isort and black reformatting

Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>

* remove utmos, make dataset path required

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* remove unused imports

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Enable loading MagpieTTS from HF

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

* Support speaker index in do_tts api

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>

---------

Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>
Signed-off-by: Subhankar Ghosh <subhankar2321@gmail.com>
Co-authored-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants