Add Doge model #35891

LoserCheems · 2025-01-25T18:50:46Z

What does this PR do?

Fixes #35889
Support the Doge-SLM family of small language models.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

to: @ArthurZucker

This reverts commit 229cdca.

Rocketknight1 · 2025-01-27T14:51:41Z

Hi @LoserCheems, and thanks for the PR! The model looks cool and I like the paper too, but we're trying to add new models using modular in future. You can see a guide here, and an example modular PR here.

If you write modular_doge.py with inheritance then the configuration_ and modeling_ files will be auto-generated. This makes the PR much shorter and easier to review.

LoserCheems · 2025-01-27T17:48:23Z

Thank you @Rocketknight1 , I've written modular_doge.py, but I'm sorry I don't quite understand modular and may make smoe mistakes...

Rocketknight1 · 2025-01-27T18:53:29Z

Hi @LoserCheems yes, don't worry, it's a new feature so everyone is a bit confused about it! 😅

Your modular_doge.py file looks good! The next step is to find code that's copied from other models in transformers, and replace that with inheritance. This will make modular_doge.py much smaller, but the full modeling_doge.py will still be generated without inheritance. You can see some examples in the Qwen2.5 PR:

Classes like DogeMLP and DogeForSequenceClassification look like they use code from other library classes like Llama, so you could just inherit those instead in the modular file. You can run make fix-copies to regenerate modeling_doge.py and confirm that it still works.

LoserCheems · 2025-01-28T03:52:08Z

Thank you @Rocketknight1. In fact, because the weight name or config name is different, can directly inherited class is not much, a total of RMSNorm, RotaryEmbedding , and DogeForSequenceClassification inherited from Llama.

Rocketknight1 · 2025-01-28T15:51:16Z

Hi @LoserCheems, the last code quality error is caused by an unprotected import torch. These need to be guarded by if is_torch_available because some people have JAX-only or TF-only systems, and unguarded imports can make it impossible for them to use the library!

There are unrelated failing tests under tests_torch - you can ignore them for now, but once you can get code quality green then let me know and I'll review the PR.

…_utils

LoserCheems · 2025-01-28T16:25:32Z

Sorry @Rocketknight1, I mistakenly imported PretrainedConfig from modeling_utils, which is now fixed.

LoserCheems · 2025-01-28T16:53:07Z

em🤓, There seems to be something wrong with RotaryEmbedding inherited from Llama.

LoserCheems · 2025-01-29T01:24:26Z

gentle ping @Rocketknight1

LoserCheems and others added 7 commits January 26, 2025 01:31

Add Doge Model

2a38123

Fix code quality

5c96118

Rollback an error commit

0f689d6

Fix config for open-source weights

229cdca

Revert "Fix config for open-source weights"

749dbcd

This reverts commit 229cdca.

Merge branch 'huggingface:main' into add-doge-model

e2f5c36

Merge branch 'main' into add-doge-model

e66977e

LoserCheems added 2 commits January 28, 2025 01:42

Add modular_doge

ca7630a

Merge branch 'main' into add-doge-model

17388cf

LoserCheems added 2 commits January 28, 2025 11:47

Update Doge inherits from Llama

79c0659

Merge branch 'main' into add-doge-model

f4d895c

LoserCheems added 3 commits January 28, 2025 12:07

Fix import bug

941d6b5

Merge branch 'main' into add-doge-model

4958ff1

[docs] Add usage of doge model

1466142

LoserCheems added 2 commits January 29, 2025 00:13

Fix Doge import pretrainedconfig from modeling_utils to configuration…

aa4fcfd

…_utils

Merge branch 'main' into add-doge-model

c346728

eustlb mentioned this pull request Jan 28, 2025

[Whisper] Fix num_return_sequences behavior #35939

Draft

LoserCheems added 2 commits January 29, 2025 00:30

[docs] remove trust remote code from doge

7cbea89

Merge branch 'main' into add-doge-model

cdcbd34

Fix dynamo bug in doge model

c935266

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Doge model #35891

Add Doge model #35891

LoserCheems commented Jan 25, 2025 •

edited

Loading

Rocketknight1 commented Jan 27, 2025

LoserCheems commented Jan 27, 2025

Rocketknight1 commented Jan 27, 2025

LoserCheems commented Jan 28, 2025

Rocketknight1 commented Jan 28, 2025

LoserCheems commented Jan 28, 2025

LoserCheems commented Jan 28, 2025

LoserCheems commented Jan 29, 2025

Add Doge model #35891

Are you sure you want to change the base?

Add Doge model #35891

Conversation

LoserCheems commented Jan 25, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

Rocketknight1 commented Jan 27, 2025

LoserCheems commented Jan 27, 2025

Rocketknight1 commented Jan 27, 2025

LoserCheems commented Jan 28, 2025

Rocketknight1 commented Jan 28, 2025

LoserCheems commented Jan 28, 2025

LoserCheems commented Jan 28, 2025

LoserCheems commented Jan 29, 2025

LoserCheems commented Jan 25, 2025 •

edited

Loading