In-framework deployment #9438

oyilmaz-nvidia · 2024-06-11T16:48:30Z

What does this PR do ?

PR for in framework deployment. Took the commits from this PR: #8958 and made a few changes.

for more information, see https://pre-commit.ci

…elds from the relevant internal classes instead of hard-coding whenever possible

for more information, see https://pre-commit.ci

… gpus to be controlled via argument

for more information, see https://pre-commit.ci

…7B and Nemotron3-22B

for more information, see https://pre-commit.ci

…ytorch lightning DDP behavior

for more information, see https://pre-commit.ci

…i-prompt logprob calculation

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

… its use

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

meatybobby

LGTM

* initial MegatronGPTDeployable class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete old comment * first draft of MegatronGPTDeployable test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small cleanup of test_triton_deployable.py * move MegatronGPTDeployable into nlp folder since it is language specific * update test_triton_deployable for new MegatronGPTDeployable location * renaming NemoQueryLLM classes * MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible * add NemoTritonQueryLLMPyTorch class and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument * got logprobs working, but can only process one prompt at a time * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo deployable to deploy_triton.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * multigpu working, with manual torch.distributed calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename MegatronGPTDeployable to MegatronLLMDeployable * MegatronGPTDeployable->MegatronLLMDeployable rename for filenames * move torch.distributed calls inside MegatronLLMDeployable * add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename test_triton_deployable.py to tests_pytriton_deploy.py * cleanup, comments, and style guide fixes * add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing formatting of logprob outputs * fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * fixing codeQL issues * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * removed min_length definition in previous commit but forgot to remove its use * update comments and arguments in deploy/nlp/query_llm.py * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * delete unused arguments from test_pytriton_deploy.py * remove some debug prints from megatronllm_deployable * rename test file due to pytest issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: Justin Kim <jukim@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com>

jukim-nv and others added 30 commits April 17, 2024 15:42

initial MegatronGPTDeployable class

6365f60

[pre-commit.ci] auto fixes from pre-commit.com hooks

f69a7a2

for more information, see https://pre-commit.ci

delete old comment

fd5acd1

first draft of MegatronGPTDeployable test script

690ca9e

[pre-commit.ci] auto fixes from pre-commit.com hooks

29e73a4

for more information, see https://pre-commit.ci

small cleanup of test_triton_deployable.py

af821c3

move MegatronGPTDeployable into nlp folder since it is language specific

03bdbd2

update test_triton_deployable for new MegatronGPTDeployable location

bc44ef5

renaming NemoQueryLLM classes

fe18916

MegatronGPTDeployable should programatically generate input/output fi…

fac5d5e

…elds from the relevant internal classes instead of hard-coding whenever possible

add NemoTritonQueryLLMPyTorch class and example

d6c37cc

[pre-commit.ci] auto fixes from pre-commit.com hooks

34da7ab

for more information, see https://pre-commit.ci

MegatronGPTModel should always load on creation, also allow number of…

1269d5a

… gpus to be controlled via argument

got logprobs working, but can only process one prompt at a time

672ac48

[pre-commit.ci] auto fixes from pre-commit.com hooks

26cc22b

for more information, see https://pre-commit.ci

add nemo deployable to deploy_triton.py

c7cbc1e

[pre-commit.ci] auto fixes from pre-commit.com hooks

6df411d

for more information, see https://pre-commit.ci

multigpu working, with manual torch.distributed calls

4ae811b

[pre-commit.ci] auto fixes from pre-commit.com hooks

7fbc82c

for more information, see https://pre-commit.ci

rename MegatronGPTDeployable to MegatronLLMDeployable

5095023

MegatronGPTDeployable->MegatronLLMDeployable rename for filenames

7bf9b66

move torch.distributed calls inside MegatronLLMDeployable

5ef973c

add constructor for existing model class, tested working with Mistral…

f76c891

…7B and Nemotron3-22B

[pre-commit.ci] auto fixes from pre-commit.com hooks

3216605

for more information, see https://pre-commit.ci

rename test_triton_deployable.py to tests_pytriton_deploy.py

f5f5891

cleanup, comments, and style guide fixes

db08395

add warning for multigpu cases where users will need to be aware of p…

6421163

…ytorch lightning DDP behavior

[pre-commit.ci] auto fixes from pre-commit.com hooks

b6dbba0

for more information, see https://pre-commit.ci

fixing formatting of logprob outputs

f8ddc2a

fix single gpu behavior, and add padding to outputs to allow for mult…

b4565b8

…i-prompt logprob calculation

oyilmaz-nvidia and others added 13 commits May 13, 2024 16:55

Merge branch 'main' into megatrongpt_deployable

0591b52

fixing codeQL issues

f9d136a

Apply isort and black reformatting

f847a87

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

Merge branch 'main' into megatrongpt_deployable

7fea202

Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>

Apply isort and black reformatting

1ba944e

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

Merge branch 'main' into megatrongpt_deployable

42fd67d

Merge branch 'main' into megatrongpt_deployable

3f57943

removed min_length definition in previous commit but forgot to remove…

52db447

… its use

update comments and arguments in deploy/nlp/query_llm.py

5c3adbe

Apply isort and black reformatting

13bf014

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

delete unused arguments from test_pytriton_deploy.py

33454db

remove some debug prints from megatronllm_deployable

965e5e2

in framework deployment

87cff82

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia added NLP Run CICD labels Jun 11, 2024

rename test file due to pytest issue

00c4281

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

github-actions bot removed the NLP label Jun 12, 2024

Merge branch 'main' into onur/inframework_deployment

4b9194e

oyilmaz-nvidia added NLP Run CICD and removed Run CICD labels Jun 12, 2024

oyilmaz-nvidia assigned janekl and meatybobby Jun 12, 2024

meatybobby approved these changes Jun 12, 2024

View reviewed changes

oyilmaz-nvidia merged commit a01fa6d into NVIDIA:main Jun 13, 2024
208 checks passed

ko3n1g mentioned this pull request Jul 18, 2024

Release 2.0.0rc1 #9786

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-framework deployment #9438

In-framework deployment #9438

oyilmaz-nvidia commented Jun 11, 2024

meatybobby left a comment

In-framework deployment #9438

In-framework deployment #9438

Conversation

oyilmaz-nvidia commented Jun 11, 2024

What does this PR do ?

meatybobby left a comment

Choose a reason for hiding this comment