Add KokoroTTS support for voice agent framework#14910
Conversation
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Signed-off-by: taejinp <tango4j@gmail.com>
…NeMo/NeMo into tango4j/add_kokoro_to_va
|
|
||
| Args: | ||
| lang_code: Language code for the model (default: 'a' for American English) | ||
| voice: Voice to use (default: 'af_heart') |
There was a problem hiding this comment.
can we list more voice options in the comments?
There was a problem hiding this comment.
af_heart, af_bella, am_fenrir am_michael
are recommend
Check out
https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
|
|
||
| def __init__( | ||
| self, | ||
| lang_code: str = "a", |
There was a problem hiding this comment.
It seems that the language code from Kokoro is every different from the ones used in other components (ASR/LLM/TTS), where "en-us" would be used for "American english". Maybe it's better to create a mapping from the ISO standard to the ones used by kokoro?
There was a problem hiding this comment.
Just in case for the future, here is the mapping table:
emoji,letter,language,ietf_code
🇺🇸,a,American English,en-US
🇬🇧,b,British English,en-GB
🇪🇸,e,Spanish,es-ES
🇫🇷,f,French (France),fr-FR
🇮🇳,h,Hindi,hi-IN
🇮🇹,i,Italian,it-IT
🇯🇵,j,Japanese,ja-JP
🇧🇷,p,Brazilian Portuguese,pt-BR
🇨🇳,z,Mandarin Chinese,zh-CN
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
|
[🤖]: Hi @tango4j 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
What does this PR do ?
This PR adds kokoro TTS support and configurations to NeMo Voice Agent framework.
Collection: [Note which collection this PR will affect]
voice agent
Changelog
See
nemo/agents/voice_agent/pipecat/services/nemo/tts.py
for core changes
Usage
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in NeMo ASR.
Additional Information