Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mingyuanm/sdxl export #8926

Merged
merged 32 commits into from
Apr 18, 2024
Merged

Mingyuanm/sdxl export #8926

merged 32 commits into from
Apr 18, 2024

Conversation

Victor49152
Copy link
Collaborator

What does this PR do ?

Add SDXL quantization and trt inference pipeline

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@Victor49152 Victor49152 requested a review from Edwardf0t1 April 15, 2024 19:58
Victor49152 and others added 2 commits April 15, 2024 13:05
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
@Victor49152
Copy link
Collaborator Author

jenkins

@Victor49152
Copy link
Collaborator Author

jenkins

@Victor49152
Copy link
Collaborator Author

jenkins

@Victor49152 Victor49152 requested a review from jingyu-ml April 16, 2024 23:11
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
@Victor49152 Victor49152 force-pushed the mingyuanm/sdxl_export branch from 52b3cce to a87f550 Compare April 17, 2024 03:12
Victor49152 and others added 5 commits April 17, 2024 14:09
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
@Victor49152
Copy link
Collaborator Author

jenkins

# limitations under the License.

import math
import time

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'time' is not used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more alert here.

@Victor49152
Copy link
Collaborator Author

jenkins

@Victor49152
Copy link
Collaborator Author

jenkins

Copy link
Collaborator

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. 🚀
Left minor comments. Just FYI, we are renaming ammo to modelopt. We can do a PR later to incorporate the new name here.

# limitations under the License.

import math
import time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more alert here.

@Victor49152 Victor49152 merged commit c687a69 into main Apr 18, 2024
128 of 129 checks passed
@Victor49152 Victor49152 deleted the mingyuanm/sdxl_export branch April 18, 2024 22:06
marcromeyn pushed a commit that referenced this pull request Apr 22, 2024
* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
marcromeyn added a commit that referenced this pull request Apr 22, 2024
* Adding MegatronParallel

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Minor quantization pipeline updates (#8924)

* Detect 'arcname' prefix in utils when handling .nemo tarball

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Address megatron_amp_O2 = True case in quantization

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix converter (#8960)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix memory leak at loss func (#8868)

* PR #8803: Update embedding init prototype to match mc

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR #8810: Fix import of get_gpt_layer_ammo_spec

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR #8853: Fix memory leak at loss func

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

---------

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* PP support in LoRA merge script (#8934)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable pp support for merge script and fix output precision

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove incomplete script for next release

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Mingyuanm/sdxl export (#8926)

* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (#8866)

* Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the signature of ZarrPathStore.__delitem__

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* update (#8978)

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* change the condition for get qkv tensor from linear_qkv output (#8965)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Update Latest News (#8837)

* Update Latest News

Adds links to articles on
* NeMo framework on GKE
* Responsible Gen AI using NeMo and Picasso
* NeMo powering Amazon Titan foundation models

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Minor updates to latest news in README

* Remove bullets
* Editing text for clarity

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Format latest news as a dropdown list

* Uses embedded html to format news to dropdown, hiding lengthy details
* Fixes formatting of the title

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add break to improve readability of latest news image

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add LLM and MM section in latest news

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add margin in latest news expandable lists

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Remove styling of expandable list

* Github appears to not render styled elements when
embedded as raw html in rst

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Fold the first news item by default

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

---------

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix incorrect link to latest news in README (#8985)

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* make unit tests works

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* add pytest-mock to unit test reqs

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Enable using hybrid asr models in CTC Segmentation tool (#8828)

* enable using hybrid asr models in ctc segmentation tool

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Add safety checks for 'data' key in MegatronGPTModel cfg (#8991)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* address some comments

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* TDT confidence fix (#8982)

* tdt confidence fix

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Address PR comments

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

---------

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Shashank Verma <shashank3959@gmail.com>
Co-authored-by: Shashank Verma <shashankv@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
xingyaoww pushed a commit to xingyaoww/NeMo that referenced this pull request Apr 23, 2024
* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
xingyaoww pushed a commit to xingyaoww/NeMo that referenced this pull request Apr 23, 2024
* Adding MegatronParallel

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Minor quantization pipeline updates (NVIDIA#8924)

* Detect 'arcname' prefix in utils when handling .nemo tarball

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Address megatron_amp_O2 = True case in quantization

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix converter (NVIDIA#8960)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix memory leak at loss func (NVIDIA#8868)

* PR NVIDIA#8803: Update embedding init prototype to match mc

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8853: Fix memory leak at loss func

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

---------

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* PP support in LoRA merge script (NVIDIA#8934)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable pp support for merge script and fix output precision

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove incomplete script for next release

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Mingyuanm/sdxl export (NVIDIA#8926)

* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866)

* Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the signature of ZarrPathStore.__delitem__

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* update (NVIDIA#8978)

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Update Latest News (NVIDIA#8837)

* Update Latest News

Adds links to articles on
* NeMo framework on GKE
* Responsible Gen AI using NeMo and Picasso
* NeMo powering Amazon Titan foundation models

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Minor updates to latest news in README

* Remove bullets
* Editing text for clarity

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Format latest news as a dropdown list

* Uses embedded html to format news to dropdown, hiding lengthy details
* Fixes formatting of the title

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add break to improve readability of latest news image

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add LLM and MM section in latest news

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add margin in latest news expandable lists

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Remove styling of expandable list

* Github appears to not render styled elements when
embedded as raw html in rst

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Fold the first news item by default

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

---------

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix incorrect link to latest news in README (NVIDIA#8985)

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* make unit tests works

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* add pytest-mock to unit test reqs

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828)

* enable using hybrid asr models in ctc segmentation tool

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* address some comments

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* TDT confidence fix (NVIDIA#8982)

* tdt confidence fix

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Address PR comments

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

---------

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Shashank Verma <shashank3959@gmail.com>
Co-authored-by: Shashank Verma <shashankv@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
* Adding MegatronParallel

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Minor quantization pipeline updates (NVIDIA#8924)

* Detect 'arcname' prefix in utils when handling .nemo tarball

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Address megatron_amp_O2 = True case in quantization

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix converter (NVIDIA#8960)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix memory leak at loss func (NVIDIA#8868)

* PR NVIDIA#8803: Update embedding init prototype to match mc

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8853: Fix memory leak at loss func

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

---------

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* PP support in LoRA merge script (NVIDIA#8934)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable pp support for merge script and fix output precision

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove incomplete script for next release

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Mingyuanm/sdxl export (NVIDIA#8926)

* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866)

* Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the signature of ZarrPathStore.__delitem__

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* update (NVIDIA#8978)

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Update Latest News (NVIDIA#8837)

* Update Latest News

Adds links to articles on
* NeMo framework on GKE
* Responsible Gen AI using NeMo and Picasso
* NeMo powering Amazon Titan foundation models

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Minor updates to latest news in README

* Remove bullets
* Editing text for clarity

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Format latest news as a dropdown list

* Uses embedded html to format news to dropdown, hiding lengthy details
* Fixes formatting of the title

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add break to improve readability of latest news image

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add LLM and MM section in latest news

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add margin in latest news expandable lists

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Remove styling of expandable list

* Github appears to not render styled elements when
embedded as raw html in rst

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Fold the first news item by default

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

---------

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix incorrect link to latest news in README (NVIDIA#8985)

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* make unit tests works

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* add pytest-mock to unit test reqs

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828)

* enable using hybrid asr models in ctc segmentation tool

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* address some comments

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* TDT confidence fix (NVIDIA#8982)

* tdt confidence fix

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Address PR comments

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

---------

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Shashank Verma <shashank3959@gmail.com>
Co-authored-by: Shashank Verma <shashankv@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
galv pushed a commit to galv/NeMo that referenced this pull request Apr 29, 2024
* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
galv pushed a commit to galv/NeMo that referenced this pull request Apr 29, 2024
* Adding MegatronParallel

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Minor quantization pipeline updates (NVIDIA#8924)

* Detect 'arcname' prefix in utils when handling .nemo tarball

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Address megatron_amp_O2 = True case in quantization

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix converter (NVIDIA#8960)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix memory leak at loss func (NVIDIA#8868)

* PR NVIDIA#8803: Update embedding init prototype to match mc

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8853: Fix memory leak at loss func

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

---------

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* PP support in LoRA merge script (NVIDIA#8934)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable pp support for merge script and fix output precision

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove incomplete script for next release

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Mingyuanm/sdxl export (NVIDIA#8926)

* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866)

* Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the signature of ZarrPathStore.__delitem__

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* update (NVIDIA#8978)

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Update Latest News (NVIDIA#8837)

* Update Latest News

Adds links to articles on
* NeMo framework on GKE
* Responsible Gen AI using NeMo and Picasso
* NeMo powering Amazon Titan foundation models

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Minor updates to latest news in README

* Remove bullets
* Editing text for clarity

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Format latest news as a dropdown list

* Uses embedded html to format news to dropdown, hiding lengthy details
* Fixes formatting of the title

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add break to improve readability of latest news image

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add LLM and MM section in latest news

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add margin in latest news expandable lists

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Remove styling of expandable list

* Github appears to not render styled elements when
embedded as raw html in rst

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Fold the first news item by default

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

---------

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix incorrect link to latest news in README (NVIDIA#8985)

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* make unit tests works

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* add pytest-mock to unit test reqs

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828)

* enable using hybrid asr models in ctc segmentation tool

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* address some comments

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* TDT confidence fix (NVIDIA#8982)

* tdt confidence fix

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Address PR comments

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

---------

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Shashank Verma <shashank3959@gmail.com>
Co-authored-by: Shashank Verma <shashankv@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* Adding MegatronParallel

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Minor quantization pipeline updates (#8924)

* Detect 'arcname' prefix in utils when handling .nemo tarball

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Address megatron_amp_O2 = True case in quantization

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix converter (#8960)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix memory leak at loss func (#8868)

* PR #8803: Update embedding init prototype to match mc

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR #8810: Fix import of get_gpt_layer_ammo_spec

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR #8853: Fix memory leak at loss func

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

---------

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* PP support in LoRA merge script (#8934)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable pp support for merge script and fix output precision

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove incomplete script for next release

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Mingyuanm/sdxl export (#8926)

* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (#8866)

* Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the signature of ZarrPathStore.__delitem__

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* update (#8978)

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* change the condition for get qkv tensor from linear_qkv output (#8965)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Update Latest News (#8837)

* Update Latest News

Adds links to articles on
* NeMo framework on GKE
* Responsible Gen AI using NeMo and Picasso
* NeMo powering Amazon Titan foundation models

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Minor updates to latest news in README

* Remove bullets
* Editing text for clarity

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Format latest news as a dropdown list

* Uses embedded html to format news to dropdown, hiding lengthy details
* Fixes formatting of the title

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add break to improve readability of latest news image

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add LLM and MM section in latest news

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add margin in latest news expandable lists

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Remove styling of expandable list

* Github appears to not render styled elements when
embedded as raw html in rst

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Fold the first news item by default

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

---------

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix incorrect link to latest news in README (#8985)

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* make unit tests works

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* add pytest-mock to unit test reqs

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Enable using hybrid asr models in CTC Segmentation tool (#8828)

* enable using hybrid asr models in ctc segmentation tool

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Add safety checks for 'data' key in MegatronGPTModel cfg (#8991)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* address some comments

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* TDT confidence fix (#8982)

* tdt confidence fix

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Address PR comments

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

---------

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Shashank Verma <shashank3959@gmail.com>
Co-authored-by: Shashank Verma <shashankv@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Adding MegatronParallel

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Minor quantization pipeline updates (NVIDIA#8924)

* Detect 'arcname' prefix in utils when handling .nemo tarball

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Address megatron_amp_O2 = True case in quantization

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add Megatron-LM to PYTHONPATH correctly in Jenkinsfile

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix converter (NVIDIA#8960)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix memory leak at loss func (NVIDIA#8868)

* PR NVIDIA#8803: Update embedding init prototype to match mc

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8810: Fix import of get_gpt_layer_ammo_spec

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

* PR NVIDIA#8853: Fix memory leak at loss func

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>

---------

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* PP support in LoRA merge script (NVIDIA#8934)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* enable pp support for merge script and fix output precision

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove incomplete script for next release

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Mingyuanm/sdxl export (NVIDIA#8926)

* Move cached embedding devices and dtype for onnx export consistency

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add old trt export/inference script, currently not working in latest container.

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Add NeMo TRT inference pipeline and quatization workflow

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add guards to avoid undefined variables

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add conversion script from hf sdxl to nemo sdxl

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update quantize pipeline to adapt to variable image dimension

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* update sdxl pipeline to be aware of additional emb channels

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add guards for potential local var

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copyright header

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update calib prompt file path

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* Update file paths

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* minor update

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update default quantization config

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* remove unused imports/vars

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused imports

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>

---------

Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Avoid unpacking NeMo checkpoints before exporting to TRT-LLM (NVIDIA#8866)

* Replaced unpacking of nemo checkpoints on export with a VFS-like TarPath object.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the signature of ZarrPathStore.__delitem__

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* update (NVIDIA#8978)

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* change the condition for get qkv tensor from linear_qkv output (NVIDIA#8965)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Update Latest News (NVIDIA#8837)

* Update Latest News

Adds links to articles on
* NeMo framework on GKE
* Responsible Gen AI using NeMo and Picasso
* NeMo powering Amazon Titan foundation models

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Minor updates to latest news in README

* Remove bullets
* Editing text for clarity

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Format latest news as a dropdown list

* Uses embedded html to format news to dropdown, hiding lengthy details
* Fixes formatting of the title

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add break to improve readability of latest news image

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add LLM and MM section in latest news

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Add margin in latest news expandable lists

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Remove styling of expandable list

* Github appears to not render styled elements when
embedded as raw html in rst

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

* Fold the first news item by default

Signed-off-by: Shashank Verma <shashankv@nvidia.com>

---------

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Fix incorrect link to latest news in README (NVIDIA#8985)

Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* make unit tests works

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* add pytest-mock to unit test reqs

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Enable using hybrid asr models in CTC Segmentation tool (NVIDIA#8828)

* enable using hybrid asr models in ctc segmentation tool

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Add safety checks for 'data' key in MegatronGPTModel cfg (NVIDIA#8991)

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* address some comments

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* TDT confidence fix (NVIDIA#8982)

* tdt confidence fix

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

* Address PR comments

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>

---------

Signed-off-by: Marc Romeyn <marcromeyn@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Shashank Verma <shashankv@nvidia.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Shriya Palsamudram <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Alexey Panteleev <apanteleev87@gmail.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Shashank Verma <shashank3959@gmail.com>
Co-authored-by: Shashank Verma <shashankv@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants