Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts to convert HF lora to nemo #9102

Merged
merged 9 commits into from
May 8, 2024
Merged

Conversation

arendu
Copy link
Collaborator

@arendu arendu commented May 3, 2024

What does this PR do ?

scripts to convert hugging face lora model into nemo.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
#  convert NeMo to HF:
# the script expects a adapter_config.json file which is standard in HF:
python scripts/checkpoint_converters/lora_converters/convert_nemo_to_canonical.py \
    --nemo_lora_path nemo_style_lora_model.nemo \
    --output_path ./canonical_style_lora_model.nemo \
    --hf_format --hf_config checkpoints/bin/adapter_config.json


#  convert HF to NeMo:
# /checkpoints/bin/ is a folder containing the HF lora checkpoint (usually named adapter_model.bin)
# and a HF lora config file (usually named adapter_config.json)
python scripts/checkpoint_converters/lora_converters/convert_hf_to_canonical.py \
    --hf_lora_path /checkpoints/bin/ \
    --output_path output_dir/converted_lora.nemo \
    --nemo_config model_config.yaml

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

arendu and others added 4 commits May 3, 2024 07:25
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: arendu <adithya.r@gmail.com>
def convert_lora(lora_hf_path, save_path, lora_yaml):
config_file = f"{lora_hf_path}/adapter_config.json"
model_file = f"{lora_hf_path}/adapter_model.bin"
hf_lora_config = json.loads(open(config_file).read())

Check warning

Code scanning / CodeQL

File is not always closed Warning

File is opened but is not closed.
@arendu arendu marked this pull request as ready for review May 3, 2024 08:08
@arendu arendu added the Run CICD label May 3, 2024
@arendu arendu requested review from ertkonuk and cuichenx May 3, 2024 16:29
Signed-off-by: arendu <adithya.r@gmail.com>
@github-actions github-actions bot added the NLP label May 6, 2024
@arendu arendu added Run CICD and removed Run CICD labels May 6, 2024
@arendu arendu requested a review from aklife97 May 6, 2024 23:49
torch.save(lora_state_dict, f"{save_path}/model_weights_hf_formatted.pt")
Path(save_path).mkdir(parents=True, exist_ok=True)
torch.save(lora_state_dict, f"{save_path}/adapter_model.bin")
adapter_config = json.load(open(args.hf_config))

Check warning

Code scanning / CodeQL

File is not always closed Warning

File is opened but is not closed.
@arendu arendu added Run CICD and removed Run CICD labels May 7, 2024
@pablo-garay
Copy link
Collaborator

Tests passed: https://github.com/NVIDIA/NeMo/actions/runs/8977317099
Let me know if you want to merge

Copy link
Collaborator

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, LGTM, thanks

@@ -145,7 +145,7 @@ inference:
top_p: 0.9 # If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
temperature: 1.0 # sampling temperature
all_probs: False # whether return the log prob for all the tokens in vocab
repetition_penalty: 1.2 # The parameter for repetition penalty. 1.0 means no penalty.
repetition_penalty: 1.0 # The parameter for repetition penalty. 1.0 means no penalty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will change existing inference results, are we sure we should do this?

Copy link
Collaborator

@aklife97 aklife97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!
just a single comment above

@arendu arendu merged commit 6442bb6 into main May 8, 2024
133 checks passed
@arendu arendu deleted the adithyare/HF_nemo_compatible_lora branch May 8, 2024 16:56
BoxiangW pushed a commit to BoxiangW/NeMo that referenced this pull request Jun 5, 2024
* convert nemo to hf and hf to nemo

Signed-off-by: arendu <adithya.r@gmail.com>

* example usage

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: arendu <adithya.r@gmail.com>

* canonicanl lora in nemo updates

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Boxiang Wang <boxiangw@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* convert nemo to hf and hf to nemo

Signed-off-by: arendu <adithya.r@gmail.com>

* example usage

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: arendu <adithya.r@gmail.com>

* canonicanl lora in nemo updates

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants