-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-framework deployment #9438
Merged
oyilmaz-nvidia
merged 47 commits into
NVIDIA:main
from
oyilmaz-nvidia:onur/inframework_deployment
Jun 13, 2024
Merged
In-framework deployment #9438
oyilmaz-nvidia
merged 47 commits into
NVIDIA:main
from
oyilmaz-nvidia:onur/inframework_deployment
Jun 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…elds from the relevant internal classes instead of hard-coding whenever possible
for more information, see https://pre-commit.ci
… gpus to be controlled via argument
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…7B and Nemotron3-22B
for more information, see https://pre-commit.ci
…ytorch lightning DDP behavior
for more information, see https://pre-commit.ci
…i-prompt logprob calculation
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
meatybobby
approved these changes
Jun 12, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
galv
pushed a commit
to galv/NeMo
that referenced
this pull request
Jun 13, 2024
* initial MegatronGPTDeployable class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete old comment * first draft of MegatronGPTDeployable test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small cleanup of test_triton_deployable.py * move MegatronGPTDeployable into nlp folder since it is language specific * update test_triton_deployable for new MegatronGPTDeployable location * renaming NemoQueryLLM classes * MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible * add NemoTritonQueryLLMPyTorch class and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument * got logprobs working, but can only process one prompt at a time * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo deployable to deploy_triton.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * multigpu working, with manual torch.distributed calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename MegatronGPTDeployable to MegatronLLMDeployable * MegatronGPTDeployable->MegatronLLMDeployable rename for filenames * move torch.distributed calls inside MegatronLLMDeployable * add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename test_triton_deployable.py to tests_pytriton_deploy.py * cleanup, comments, and style guide fixes * add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing formatting of logprob outputs * fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * fixing codeQL issues * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * removed min_length definition in previous commit but forgot to remove its use * update comments and arguments in deploy/nlp/query_llm.py * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * delete unused arguments from test_pytriton_deploy.py * remove some debug prints from megatronllm_deployable * rename test file due to pytest issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: Justin Kim <jukim@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com>
JesusPaz
pushed a commit
to JesusPaz/NeMo
that referenced
this pull request
Jun 18, 2024
* initial MegatronGPTDeployable class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete old comment * first draft of MegatronGPTDeployable test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small cleanup of test_triton_deployable.py * move MegatronGPTDeployable into nlp folder since it is language specific * update test_triton_deployable for new MegatronGPTDeployable location * renaming NemoQueryLLM classes * MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible * add NemoTritonQueryLLMPyTorch class and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument * got logprobs working, but can only process one prompt at a time * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo deployable to deploy_triton.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * multigpu working, with manual torch.distributed calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename MegatronGPTDeployable to MegatronLLMDeployable * MegatronGPTDeployable->MegatronLLMDeployable rename for filenames * move torch.distributed calls inside MegatronLLMDeployable * add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename test_triton_deployable.py to tests_pytriton_deploy.py * cleanup, comments, and style guide fixes * add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing formatting of logprob outputs * fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * fixing codeQL issues * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * removed min_length definition in previous commit but forgot to remove its use * update comments and arguments in deploy/nlp/query_llm.py * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * delete unused arguments from test_pytriton_deploy.py * remove some debug prints from megatronllm_deployable * rename test file due to pytest issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: Justin Kim <jukim@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com>
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* initial MegatronGPTDeployable class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete old comment * first draft of MegatronGPTDeployable test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small cleanup of test_triton_deployable.py * move MegatronGPTDeployable into nlp folder since it is language specific * update test_triton_deployable for new MegatronGPTDeployable location * renaming NemoQueryLLM classes * MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible * add NemoTritonQueryLLMPyTorch class and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument * got logprobs working, but can only process one prompt at a time * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo deployable to deploy_triton.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * multigpu working, with manual torch.distributed calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename MegatronGPTDeployable to MegatronLLMDeployable * MegatronGPTDeployable->MegatronLLMDeployable rename for filenames * move torch.distributed calls inside MegatronLLMDeployable * add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename test_triton_deployable.py to tests_pytriton_deploy.py * cleanup, comments, and style guide fixes * add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing formatting of logprob outputs * fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * fixing codeQL issues * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * removed min_length definition in previous commit but forgot to remove its use * update comments and arguments in deploy/nlp/query_llm.py * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * delete unused arguments from test_pytriton_deploy.py * remove some debug prints from megatronllm_deployable * rename test file due to pytest issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: Justin Kim <jukim@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com>
XuesongYang
pushed a commit
to paarthneekhara/NeMo
that referenced
this pull request
Jan 18, 2025
* initial MegatronGPTDeployable class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete old comment * first draft of MegatronGPTDeployable test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small cleanup of test_triton_deployable.py * move MegatronGPTDeployable into nlp folder since it is language specific * update test_triton_deployable for new MegatronGPTDeployable location * renaming NemoQueryLLM classes * MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible * add NemoTritonQueryLLMPyTorch class and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument * got logprobs working, but can only process one prompt at a time * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo deployable to deploy_triton.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * multigpu working, with manual torch.distributed calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename MegatronGPTDeployable to MegatronLLMDeployable * MegatronGPTDeployable->MegatronLLMDeployable rename for filenames * move torch.distributed calls inside MegatronLLMDeployable * add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename test_triton_deployable.py to tests_pytriton_deploy.py * cleanup, comments, and style guide fixes * add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing formatting of logprob outputs * fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * fixing codeQL issues * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * removed min_length definition in previous commit but forgot to remove its use * update comments and arguments in deploy/nlp/query_llm.py * Apply isort and black reformatting Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> * delete unused arguments from test_pytriton_deploy.py * remove some debug prints from megatronllm_deployable * rename test file due to pytest issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com> Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: Justin Kim <jukim@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
PR for in framework deployment. Took the commits from this PR: #8958 and made a few changes.