Add DBRX Model #29921

abhi-mosaic · 2024-03-27T21:54:43Z

What does this PR do?

Add support for DbrxConfig, DbrxModel, and DbrxForCausalLM.

https://huggingface.co/databricks/dbrx-base
https://huggingface.co/databricks/dbrx-instruct

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

giyaseddin · 2024-03-27T23:31:03Z

Duplicate work here too #29910

ArthurZucker

Great work already 🔥
Would recommend you to use transfomers-cli add-new-model-like and use mixtral template, because that will fill all the necessary configuration_auto and etc for you.
Also make sure to run make fixup for code formatting, and make fix-copies to make is overwritten with the copied from

src/transformers/models/dbrx/modeling_dbrx.py

tests/models/dbrx/test_modeling_dbrx.py

ArthurZucker · 2024-03-28T01:32:11Z

tests/models/dbrx/test_modeling_dbrx.py

+
+
+@require_torch
+class DbrxModelIntegrationTest(unittest.TestCase):


we need generation tests as with a tiny model from both lib for the logits! Otherwise it's too big for our CI

Out of curiosity, what is the biggest size model your CI can handle?

I was looking at the tests for falcon and it looks like they test on a 7B and 40B "tiny" model.

We can handle 7B in float16 so ~16GB. Tiny models are way smaller.

Hi @ArthurZucker, whats the TODO here? Do we still need to build a small checkpoint or is it ok to leave this test as is with the @slow decorator?

src/transformers/models/dbrx/configuration_dbrx.py

src/transformers/models/dbrx/modeling_dbrx.py

src/transformers/__init__.py

Rocketknight1 · 2024-03-28T15:33:04Z

Hey all! I'll be handling reviews for this model from here - it looks like you have enough to work with for now, but if you get stuck at any point, or you'd like help, please let me know! We're quite excited to get DBRX ported, so don't be shy about reaching out. You can ping me here, or I'm also Matt in the collaboration channel on Slack

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Rocketknight1

Hey - DbrxBlock shouldn't be imported, because we generally don't make those internal layers available in the top-level namespace! This is also why the doctest checker is complaining - those errors should clear up if we remove these imports.

src/transformers/__init__.py

src/transformers/models/dbrx/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts

Beautiful 🤩

amyeroberts · 2024-04-18T12:20:45Z

The failing tests are quite weird and possible connection errors. I'm rerunning on CI which should hopefully resolve 🤞

eitanturok · 2024-04-18T12:24:18Z

Addressed your comments! Let's do the final push and get this in!

amyeroberts · 2024-04-18T12:31:43Z

README.md

@@ -341,6 +341,8 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
 1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
 1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
 1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec:  A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
+


quality checks are currently failing on the models_in_readme check. I suspect the extra newline might be to blame here

Suggested change

Yep, that was the error. Pushing now.

amyeroberts

Only thing left is resolving the conflicts with main for docs/source/en/perf_infer_gpu_one.md and docs/source/en/tasks/language_modeling.md.

As the tests are looking good -- quality checks now pass -- I'd recommend resolving the conflicts before the CI finished. The test suite is unfortunately pretty slow and we can be pretty confident the tests should all pass, as they were two commits ago and the changes have been minimal

eitanturok · 2024-04-18T12:56:40Z

Only thing left is resolving the conflicts with main for docs/source/en/perf_infer_gpu_one.md and docs/source/en/tasks/language_modeling.md.

As the tests are looking good -- quality checks now pass -- I'd recommend resolving the conflicts before the CI finished. The test suite is unfortunately pretty slow and we can be pretty confident the tests should all pass, as they were two commits ago and the changes have been minimal

Yes, I was just hoping on the train. I'm taking care of this now.

eitanturok · 2024-04-18T12:59:42Z

Just merged the conflicts. Now we just let the tests run and hope for the best!

LysandreJik · 2024-04-18T13:18:50Z

Awesome! Thanks all, merging!

eitanturok · 2024-04-18T14:55:48Z

As a follow up, when can we expect to see DBRX updated in the transformers documentation?

Rocketknight1 · 2024-04-18T15:11:38Z

@eitanturok the documentation is automatically generated from the docstrings / .md files in this PR, so you should see it once the release goes out!

eitanturok · 2024-04-18T15:22:34Z

Understood. And the release goes out later today, correct?

Rocketknight1 · 2024-04-18T15:29:47Z

Yes! The release has just gone out here. The frontend docs will be updated as soon as the doc builder finishes, which will happen later today. In the meantime, you can preview the docs on the main doc branch here.

* wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Eitan Turok <eitan.turok@databricks.com> Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Eitan Turok <eitanturok@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

abhi-databricks added 3 commits March 27, 2024 21:48

wip

7042915

fix __init__.py

c7dda8c

add docs

18495d0

ArthurZucker reviewed Mar 28, 2024

View reviewed changes

ArthurZucker added the New model label Mar 28, 2024

NielsRogge mentioned this pull request Mar 28, 2024

Support DBRX Model #29911

Open

eitanturok mentioned this pull request Mar 28, 2024

dbrx #29910

Closed

5 tasks

eitanturok reviewed Mar 28, 2024

View reviewed changes

src/transformers/__init__.py Outdated Show resolved Hide resolved

eitanturok reviewed Mar 28, 2024

View reviewed changes

src/transformers/__init__.py Outdated Show resolved Hide resolved

eitanturok reviewed Mar 28, 2024

View reviewed changes

src/transformers/__init__.py Outdated Show resolved Hide resolved

Blaizzy mentioned this pull request Mar 28, 2024

DBRX ml-explore/mlx-examples#628

Merged

abhi-mosaic and others added 5 commits March 28, 2024 11:17

Apply suggestions from code review

292836b

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

address comments 1

a27c69a

work on make fixup

5417623

pass configs down

46b45c1

add sdpa attention

76c2e9c

Rocketknight1 reviewed Mar 29, 2024

View reviewed changes

src/transformers/__init__.py Outdated Show resolved Hide resolved

src/transformers/__init__.py Outdated Show resolved Hide resolved

src/transformers/models/dbrx/__init__.py Outdated Show resolved Hide resolved

src/transformers/models/dbrx/__init__.py Outdated Show resolved Hide resolved

Eitan Turok added 11 commits March 29, 2024 14:02

remove DbrxBlock

4e74661

add to configuration_auto

120df40

docstring now passes formatting test

56d841e

fix style

450ae2d

update READMEs

cec7356

add dbrx to modeling_auto

b5d4a6e

make fix-copies generated this

3d9fd16

add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP

2bff6b9

config docstring passes formatting test

ea940a6

rename moe_loss_weight to router_aux_loss_coef

990f196

add to flash-attn documentation

4a6f47a

eitanturok and others added 9 commits April 18, 2024 07:26

fix ffn_act_fn

cad0b9d

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

change "dbrx" to "DBRX" for display purposes.

49bcacc

fix __init__.py?

9e26850

fix __init__.py

d714986

fix README

cac26a1

return the aux_loss

fe12d2a

remove extra spaces

58c8342

fix configuration_auto.py

d04c870

fix format in tokenization_auto

22804bf

amyeroberts approved these changes Apr 18, 2024

View reviewed changes

amyeroberts reviewed Apr 18, 2024

View reviewed changes

Eitan Turok added 2 commits April 18, 2024 12:36

remove new line

95b327f

add more useage examples

c6cbbda

amyeroberts approved these changes Apr 18, 2024

View reviewed changes

Merge branch 'main' into dbrx

8ee48c9

LysandreJik merged commit 005b957 into huggingface:main Apr 18, 2024
22 checks passed

Qubitium mentioned this pull request Apr 19, 2024

[WIP] [WORKING] dbrx (mod) support AutoGPTQ/AutoGPTQ#625

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DBRX Model #29921

Add DBRX Model #29921

abhi-mosaic commented Mar 27, 2024

giyaseddin commented Mar 27, 2024

ArthurZucker left a comment

ArthurZucker Mar 28, 2024

eitanturok Mar 29, 2024

ArthurZucker Apr 3, 2024

abhi-mosaic Apr 10, 2024

Rocketknight1 commented Mar 28, 2024

Rocketknight1 left a comment

amyeroberts left a comment

amyeroberts commented Apr 18, 2024

eitanturok commented Apr 18, 2024

amyeroberts Apr 18, 2024

eitanturok Apr 18, 2024

amyeroberts left a comment

eitanturok commented Apr 18, 2024

eitanturok commented Apr 18, 2024

LysandreJik commented Apr 18, 2024

eitanturok commented Apr 18, 2024

Rocketknight1 commented Apr 18, 2024 •

edited

Loading

eitanturok commented Apr 18, 2024

Rocketknight1 commented Apr 18, 2024



		@require_torch
		class DbrxModelIntegrationTest(unittest.TestCase):

Add DBRX Model #29921

Add DBRX Model #29921

Conversation

abhi-mosaic commented Mar 27, 2024

What does this PR do?

Before submitting

Who can review?

giyaseddin commented Mar 27, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Mar 28, 2024

Choose a reason for hiding this comment

eitanturok Mar 29, 2024

Choose a reason for hiding this comment

ArthurZucker Apr 3, 2024

Choose a reason for hiding this comment

abhi-mosaic Apr 10, 2024

Choose a reason for hiding this comment

Rocketknight1 commented Mar 28, 2024

Rocketknight1 left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts commented Apr 18, 2024

eitanturok commented Apr 18, 2024

amyeroberts Apr 18, 2024

Choose a reason for hiding this comment

eitanturok Apr 18, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

eitanturok commented Apr 18, 2024

eitanturok commented Apr 18, 2024

LysandreJik commented Apr 18, 2024

eitanturok commented Apr 18, 2024

Rocketknight1 commented Apr 18, 2024 • edited Loading

eitanturok commented Apr 18, 2024

Rocketknight1 commented Apr 18, 2024

Rocketknight1 commented Apr 18, 2024 •

edited

Loading