❓ Difficulties to reproduce BART results on CNN/DM by fine-tuning bart-large

# ❓ Help

I'm trying to fine-tune BART on CNN/DM by myself (so, starting from `facebook/bart-large` checkpoint).

However I can't reproduce the results so far... BART authors report a R1 score of `44.16` in their paper, but my best checkpoint so far is only `42.53`.

It's not an issue with the eval script, as I can reproduce the authors' results from the checkpoint `facebook/bart-large-cnn`. I get a score of `44.09` using this checkpoint.

I tried several hyper-parameters : the ones provided in the example folder, but also the ones used in fairseq repo. It doesn't change anything...

---

I'm a bit at loss on how to reproduce these fine-tuning score...  
Could anyone fine-tune BART successfully using `transformers` repo ? If yes, can you share your parameters ?
Any help would be greatly appreciated !

@sshleifer 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ Difficulties to reproduce BART results on CNN/DM by fine-tuning bart-large #5654

❓ Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development