Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Doge model #35891

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Add Doge model #35891

wants to merge 19 commits into from

Conversation

LoserCheems
Copy link

@LoserCheems LoserCheems commented Jan 25, 2025

What does this PR do?

Fixes #35889
Support the Doge-SLM family of small language models.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

to: @ArthurZucker

@Rocketknight1
Copy link
Member

Hi @LoserCheems, and thanks for the PR! The model looks cool and I like the paper too, but we're trying to add new models using modular in future. You can see a guide here, and an example modular PR here.

If you write modular_doge.py with inheritance then the configuration_ and modeling_ files will be auto-generated. This makes the PR much shorter and easier to review.

@LoserCheems
Copy link
Author

Thank you @Rocketknight1 , I've written modular_doge.py, but I'm sorry I don't quite understand modular and may make smoe mistakes...

@Rocketknight1
Copy link
Member

Hi @LoserCheems yes, don't worry, it's a new feature so everyone is a bit confused about it! 😅

Your modular_doge.py file looks good! The next step is to find code that's copied from other models in transformers, and replace that with inheritance. This will make modular_doge.py much smaller, but the full modeling_doge.py will still be generated without inheritance. You can see some examples in the Qwen2.5 PR:

image

Classes like DogeMLP and DogeForSequenceClassification look like they use code from other library classes like Llama, so you could just inherit those instead in the modular file. You can run make fix-copies to regenerate modeling_doge.py and confirm that it still works.

@LoserCheems
Copy link
Author

Thank you @Rocketknight1. In fact, because the weight name or config name is different, can directly inherited class is not much, a total of RMSNorm, RotaryEmbedding , and DogeForSequenceClassification inherited from Llama.

@Rocketknight1
Copy link
Member

Hi @LoserCheems, the last code quality error is caused by an unprotected import torch. These need to be guarded by if is_torch_available because some people have JAX-only or TF-only systems, and unguarded imports can make it impossible for them to use the library!

There are unrelated failing tests under tests_torch - you can ignore them for now, but once you can get code quality green then let me know and I'll review the PR.

@LoserCheems
Copy link
Author

Sorry @Rocketknight1, I mistakenly imported PretrainedConfig from modeling_utils, which is now fixed.

@LoserCheems
Copy link
Author

em🤓, There seems to be something wrong with RotaryEmbedding inherited from Llama.

@LoserCheems
Copy link
Author

gentle ping @Rocketknight1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request to add Doge
2 participants