Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even in interactive mode, multiturn conversation is not possible. #67

Open
ehalit opened this issue Jul 17, 2023 · 3 comments
Open

Even in interactive mode, multiturn conversation is not possible. #67

ehalit opened this issue Jul 17, 2023 · 3 comments
Labels
bug Something isn't working finished

Comments

@ehalit
Copy link

ehalit commented Jul 17, 2023

Thanks for the wonderful work!

I am running the falcon-7b-instruct model with falcon_main, I generated the appropriate model with the conversion script and from warning messages, I can tell it is in the old format. Anyway, it runs perfectly fine for the given prompt but I cannot continue the chat after the model generates its output, even in the interactive mode. Since there will be a significant time overhead due to GPU offloading every time the falcon_main script runs, I would like to have multiturn conversations in a single run. Is there a way to achieve that?

@cmp-nct
Copy link
Owner

cmp-nct commented Jul 17, 2023

I'm sorry, there are indeed a couple bugs in the chat mode.
It works most reliable when using an openassistant model.

With some other finetunes I noticed a problem with stopwords, for most fine tunes it uses stopwords to break them from "babbling" and those sometimes cause issues in chat mode.
You can override the stopwords with -S "----".
Maybe give that a try, also try OpenAssistant. I've had long chats with that already

Which fine tune did you use ?

I'll try to fix that once and for all as soon as I have the new release ready, but that can take a few days as it's a big change I am sitting on.

If you work with larger prompts, try the prompt-cache. It does not save you from the loading time but it allows to store an entire prompt preprocessed. Can save a lot of waiting time
Update: don't use the prompt cache. It's broken with the new KV cache. Will be fixed with the next PR. To use the cache now, define FALCON_NO_KV_UPGRADE

@cmp-nct cmp-nct added the bug Something isn't working label Jul 17, 2023
@ehalit
Copy link
Author

ehalit commented Jul 18, 2023

I downloaded the Falcon 7B instruction fine-tuned model from https://huggingface.co/tiiuae/falcon-7b-instruct and saved it under ggllm.cpp/models/falcon7b_instruct with

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True)

model.save_pretrained("ggllm.cpp/models/falcon7b_instruct")

I manually copy-pasted tokenizer.json into the ggllm.cpp/models/falcon7b_instruct folder. Then, I converted the model with

python falcon_convert.py models/falcon7b_instruct models/7B

I can use the .bin model with falcon_main as I explained before.

@ehalit
Copy link
Author

ehalit commented Jul 18, 2023

If I modify stopwords with -S, the application quits after the stopwords are generated by the model, rather than returning control to the user.

Edit: I guess I found the source of the problem. I only provided the --interactive-first flag which gives the first turn to me but does not allow multiturn conversation. Adding -ins allows multiturn conversation. Feel free to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working finished
Projects
None yet
Development

No branches or pull requests

2 participants