-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the reproduction of XSUM results #20
Comments
Hi, Thanks for your question. Did you use Llama-2-7b? The model used in the paper is "huggyllama/llama-7b". |
Hi, I used huggyllama/llama-7b, but I encounterd the following errors when I try to run scripts/summarization/eval.sh:
when I load other models like Llama-2-7b, there won't be such an error. |
Hi, could you provide the detailed command and tranformers version you used? I didn't reproduce the same issue on my side when using huggyllama/llama-7b. |
Thanks for your reply. The contents in scripts/summarization/eval.sh are:
As for tranformers version, I tried both 4.33.0 and 4.35.0, and I encounter the same problem. |
by the way, the above error also occur in the middle of evaluation when I use other models (such as llama-2-7b)
When seek for solutions, I found this issue. Is it possible that this error is related to beam sample that used in generation process? |
Thanks for your patience, but specify "tokenizer.pad_token_id=tokenizer.eos_token_id" still cannot solve the problem. Also, I notice that you set 'temperature=0.3, top_p=1, do_sample=True' in model.generate() function in h2o_hf/run_summarization.py, is there any particular reason for these parameter settings? Just wonder about it. |
Hi, I followed the original HELM for these parameters. Generally, large temperature will bring more diversity and less deterministic. |
Sorry to bother you again. |
Hi everyone, I have another question regarding reproducing XSUM results. In h2o_hf/scripts/summarization/eval.sh, it sets a fixed HH_SIZE and RECENT_SIZE, but the x-axis of figure 4 represents KV Cache Budget (%), so what is the relationship between size and percentage? The total number of tokens varies with each sample, right? |
I use Llama-2-7b but I still get this error, I use float16. And I check this piece of data, the prompt has 6768 tokens so I guess this is because prompt length is too long so the model collapse |
Hi, I have also met the same bug when the generation process comes to RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 So I try to test the sample 797 by modifying line #117 as requests = requests[795:] As expected, the bug occurs at Tokenization is skipped for long lines for performance reasons. This can be configured via editor.maxTokenizationLineLength. Obviously, the reason for the model collapse is that the prompt length is too long. |
Hi, I also used huggyllama/llama-7b to run the XSUM task, and got the same conclusion as yours: rouge-1: 0.267594, rouge-2: 0.098886, rouge-l: 0.222643 Do you have any ideas about this? |
Hi, I also want to know this question. Do you have any ideas? |
Hi, thanks for your great works!
I have some questions about the reproduction of XSUM results. I tried to run this command in h2o_hf dir:
I tested on all 1000 samples in xsum_5shot.jsonl, using LLaMA-7B model, but the ROUGE-2 result that I got is only about 9%
According to Figure 4 in paper, the full baseline of XSUM, LLaMA-7B is 12%
Can't figure out the reason about it. Would you please give me some advice?
Thanks a lot!
The text was updated successfully, but these errors were encountered: