-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mingyuanm/sdxl export #8926
Mingyuanm/sdxl export #8926
Conversation
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
…container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
for more information, see https://pre-commit.ci
examples/multimodal/text_to_image/stable_diffusion/sd_xl_trt_inference.py
Fixed
Show fixed
Hide fixed
examples/multimodal/text_to_image/stable_diffusion/sd_xl_export.py
Fixed
Show resolved
Hide resolved
examples/multimodal/text_to_image/stable_diffusion/sd_xl_trt_inference.py
Fixed
Show fixed
Hide fixed
examples/multimodal/text_to_image/stable_diffusion/sd_xl_trt_inference.py
Fixed
Show fixed
Hide fixed
examples/multimodal/text_to_image/stable_diffusion/sd_xl_trt_inference.py
Fixed
Show fixed
Hide fixed
jenkins |
for more information, see https://pre-commit.ci
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
for more information, see https://pre-commit.ci
jenkins |
examples/multimodal/text_to_image/stable_diffusion/sd_xl_trt_inference.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
…yuanm/sdxl_export
jenkins |
for more information, see https://pre-commit.ci
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
52b3cce
to
a87f550
Compare
for more information, see https://pre-commit.ci
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
…yuanm/sdxl_export
jenkins |
# limitations under the License. | ||
|
||
import math | ||
import time |
Check notice
Code scanning / CodeQL
Unused import Note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more alert here.
jenkins |
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. 🚀
Left minor comments. Just FYI, we are renaming ammo
to modelopt
. We can do a PR later to incorporate the new name here.
# limitations under the License. | ||
|
||
import math | ||
import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more alert here.
nemo/collections/multimodal/modules/stable_diffusion/quantization_utils/utils.py
Show resolved
Hide resolved
* Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
* Adding MegatronParallel Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Minor quantization pipeline updates (#8924) * Detect 'arcname' prefix in utils when handling .nemo tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address megatron_amp_O2 = True case in quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix converter (#8960) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix memory leak at loss func (#8868) * PR #8803: Update embedding init prototype to match mc Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR #8810: Fix import of get_gpt_layer_ammo_spec Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR #8853: Fix memory leak at loss func Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * PP support in LoRA merge script (#8934) * initial commit Signed-off-by: Chen Cui <chcui@nvidia.com> * enable pp support for merge script and fix output precision Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove incomplete script for next release Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Mingyuanm/sdxl export (#8926) * Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (#8866) * Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the signature of ZarrPathStore.__delitem__ Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * update (#8978) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * change the condition for get qkv tensor from linear_qkv output (#8965) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Update Latest News (#8837) * Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add break to improve readability of latest news image Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add LLM and MM section in latest news Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add margin in latest news expandable lists Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Fold the first news item by default Signed-off-by: Shashank Verma <shashankv@nvidia.com> --------- Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix incorrect link to latest news in README (#8985) Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * make unit tests works Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * add pytest-mock to unit test reqs Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Enable using hybrid asr models in CTC Segmentation tool (#8828) * enable using hybrid asr models in ctc segmentation tool Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Add safety checks for 'data' key in MegatronGPTModel cfg (#8991) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * address some comments Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * TDT confidence fix (#8982) * tdt confidence fix --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Address PR comments Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> --------- Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jaemin Choi <minitu77@gmail.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Shashank Verma <shashank3959@gmail.com> Co-authored-by: Shashank Verma <shashankv@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
* Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Adding MegatronParallel Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Minor quantization pipeline updates (NVIDIA#8924) * Detect 'arcname' prefix in utils when handling .nemo tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address megatron_amp_O2 = True case in quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix converter (NVIDIA#8960) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix memory leak at loss func (NVIDIA#8868) * PR NVIDIA#8803: Update embedding init prototype to match mc Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8853: Fix memory leak at loss func Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * PP support in LoRA merge script (NVIDIA#8934) * initial commit Signed-off-by: Chen Cui <chcui@nvidia.com> * enable pp support for merge script and fix output precision Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove incomplete script for next release Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Mingyuanm/sdxl export (NVIDIA#8926) * Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866) * Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the signature of ZarrPathStore.__delitem__ Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * update (NVIDIA#8978) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Update Latest News (NVIDIA#8837) * Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add break to improve readability of latest news image Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add LLM and MM section in latest news Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add margin in latest news expandable lists Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Fold the first news item by default Signed-off-by: Shashank Verma <shashankv@nvidia.com> --------- Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix incorrect link to latest news in README (NVIDIA#8985) Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * make unit tests works Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * add pytest-mock to unit test reqs Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828) * enable using hybrid asr models in ctc segmentation tool Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * address some comments Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * TDT confidence fix (NVIDIA#8982) * tdt confidence fix --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Address PR comments Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> --------- Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jaemin Choi <minitu77@gmail.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Shashank Verma <shashank3959@gmail.com> Co-authored-by: Shashank Verma <shashankv@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
* Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Adding MegatronParallel Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Minor quantization pipeline updates (NVIDIA#8924) * Detect 'arcname' prefix in utils when handling .nemo tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address megatron_amp_O2 = True case in quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix converter (NVIDIA#8960) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix memory leak at loss func (NVIDIA#8868) * PR NVIDIA#8803: Update embedding init prototype to match mc Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8853: Fix memory leak at loss func Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * PP support in LoRA merge script (NVIDIA#8934) * initial commit Signed-off-by: Chen Cui <chcui@nvidia.com> * enable pp support for merge script and fix output precision Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove incomplete script for next release Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Mingyuanm/sdxl export (NVIDIA#8926) * Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866) * Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the signature of ZarrPathStore.__delitem__ Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * update (NVIDIA#8978) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Update Latest News (NVIDIA#8837) * Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add break to improve readability of latest news image Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add LLM and MM section in latest news Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add margin in latest news expandable lists Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Fold the first news item by default Signed-off-by: Shashank Verma <shashankv@nvidia.com> --------- Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix incorrect link to latest news in README (NVIDIA#8985) Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * make unit tests works Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * add pytest-mock to unit test reqs Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828) * enable using hybrid asr models in ctc segmentation tool Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * address some comments Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * TDT confidence fix (NVIDIA#8982) * tdt confidence fix --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Address PR comments Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> --------- Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jaemin Choi <minitu77@gmail.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Shashank Verma <shashank3959@gmail.com> Co-authored-by: Shashank Verma <shashankv@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
* Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Adding MegatronParallel Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Minor quantization pipeline updates (NVIDIA#8924) * Detect 'arcname' prefix in utils when handling .nemo tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address megatron_amp_O2 = True case in quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix converter (NVIDIA#8960) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix memory leak at loss func (NVIDIA#8868) * PR NVIDIA#8803: Update embedding init prototype to match mc Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8853: Fix memory leak at loss func Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * PP support in LoRA merge script (NVIDIA#8934) * initial commit Signed-off-by: Chen Cui <chcui@nvidia.com> * enable pp support for merge script and fix output precision Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove incomplete script for next release Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Mingyuanm/sdxl export (NVIDIA#8926) * Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866) * Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the signature of ZarrPathStore.__delitem__ Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * update (NVIDIA#8978) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Update Latest News (NVIDIA#8837) * Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add break to improve readability of latest news image Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add LLM and MM section in latest news Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add margin in latest news expandable lists Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Fold the first news item by default Signed-off-by: Shashank Verma <shashankv@nvidia.com> --------- Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix incorrect link to latest news in README (NVIDIA#8985) Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * make unit tests works Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * add pytest-mock to unit test reqs Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828) * enable using hybrid asr models in ctc segmentation tool Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * address some comments Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * TDT confidence fix (NVIDIA#8982) * tdt confidence fix --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Address PR comments Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> --------- Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jaemin Choi <minitu77@gmail.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Shashank Verma <shashank3959@gmail.com> Co-authored-by: Shashank Verma <shashankv@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
* Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Ao Tang <aot@nvidia.com>
* Adding MegatronParallel Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Minor quantization pipeline updates (#8924) * Detect 'arcname' prefix in utils when handling .nemo tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address megatron_amp_O2 = True case in quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix converter (#8960) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix memory leak at loss func (#8868) * PR #8803: Update embedding init prototype to match mc Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR #8810: Fix import of get_gpt_layer_ammo_spec Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR #8853: Fix memory leak at loss func Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * PP support in LoRA merge script (#8934) * initial commit Signed-off-by: Chen Cui <chcui@nvidia.com> * enable pp support for merge script and fix output precision Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove incomplete script for next release Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Mingyuanm/sdxl export (#8926) * Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (#8866) * Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the signature of ZarrPathStore.__delitem__ Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * update (#8978) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * change the condition for get qkv tensor from linear_qkv output (#8965) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Update Latest News (#8837) * Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add break to improve readability of latest news image Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add LLM and MM section in latest news Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add margin in latest news expandable lists Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Fold the first news item by default Signed-off-by: Shashank Verma <shashankv@nvidia.com> --------- Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix incorrect link to latest news in README (#8985) Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * make unit tests works Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * add pytest-mock to unit test reqs Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Enable using hybrid asr models in CTC Segmentation tool (#8828) * enable using hybrid asr models in ctc segmentation tool Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Add safety checks for 'data' key in MegatronGPTModel cfg (#8991) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * address some comments Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * TDT confidence fix (#8982) * tdt confidence fix --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Address PR comments Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> --------- Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jaemin Choi <minitu77@gmail.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Shashank Verma <shashank3959@gmail.com> Co-authored-by: Shashank Verma <shashankv@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Ao Tang <aot@nvidia.com>
* Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Adding MegatronParallel Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Minor quantization pipeline updates (NVIDIA#8924) * Detect 'arcname' prefix in utils when handling .nemo tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address megatron_amp_O2 = True case in quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix converter (NVIDIA#8960) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix memory leak at loss func (NVIDIA#8868) * PR NVIDIA#8803: Update embedding init prototype to match mc Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> * PR NVIDIA#8853: Fix memory leak at loss func Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * PP support in LoRA merge script (NVIDIA#8934) * initial commit Signed-off-by: Chen Cui <chcui@nvidia.com> * enable pp support for merge script and fix output precision Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove incomplete script for next release Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Mingyuanm/sdxl export (NVIDIA#8926) * Move cached embedding devices and dtype for onnx export consistency Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add old trt export/inference script, currently not working in latest container. Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Add NeMo TRT inference pipeline and quatization workflow Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add guards to avoid undefined variables Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add conversion script from hf sdxl to nemo sdxl Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update quantize pipeline to adapt to variable image dimension Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * update sdxl pipeline to be aware of additional emb channels Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add guards for potential local var Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copyright header Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update calib prompt file path Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * Update file paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * minor update Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update default quantization config Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * remove unused imports/vars Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866) * Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the signature of ZarrPathStore.__delitem__ Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * update (NVIDIA#8978) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Update Latest News (NVIDIA#8837) * Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add break to improve readability of latest news image Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add LLM and MM section in latest news Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Add margin in latest news expandable lists Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma <shashankv@nvidia.com> * Fold the first news item by default Signed-off-by: Shashank Verma <shashankv@nvidia.com> --------- Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Fix incorrect link to latest news in README (NVIDIA#8985) Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * make unit tests works Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * add pytest-mock to unit test reqs Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828) * enable using hybrid asr models in ctc segmentation tool Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991) Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * address some comments Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * TDT confidence fix (NVIDIA#8982) * tdt confidence fix --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Address PR comments Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> --------- Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jaemin Choi <jaeminc@nvidia.com> Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: Shashank Verma <shashankv@nvidia.com> Signed-off-by: Shashank Verma <shashank3959@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jaemin Choi <minitu77@gmail.com> Co-authored-by: Jaemin Choi <jaeminc@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Shashank Verma <shashank3959@gmail.com> Co-authored-by: Shashank Verma <shashankv@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
What does this PR do ?
Add SDXL quantization and trt inference pipeline
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information