Even in interactive mode, multiturn conversation is not possible. #67

ehalit · 2023-07-17T13:08:01Z

Thanks for the wonderful work!

I am running the falcon-7b-instruct model with falcon_main, I generated the appropriate model with the conversion script and from warning messages, I can tell it is in the old format. Anyway, it runs perfectly fine for the given prompt but I cannot continue the chat after the model generates its output, even in the interactive mode. Since there will be a significant time overhead due to GPU offloading every time the falcon_main script runs, I would like to have multiturn conversations in a single run. Is there a way to achieve that?

The text was updated successfully, but these errors were encountered:

cmp-nct · 2023-07-17T21:10:14Z

I'm sorry, there are indeed a couple bugs in the chat mode.
It works most reliable when using an openassistant model.

With some other finetunes I noticed a problem with stopwords, for most fine tunes it uses stopwords to break them from "babbling" and those sometimes cause issues in chat mode.
You can override the stopwords with -S "----".
Maybe give that a try, also try OpenAssistant. I've had long chats with that already

Which fine tune did you use ?

I'll try to fix that once and for all as soon as I have the new release ready, but that can take a few days as it's a big change I am sitting on.

If you work with larger prompts, try the prompt-cache. It does not save you from the loading time but it allows to store an entire prompt preprocessed. Can save a lot of waiting time
Update: don't use the prompt cache. It's broken with the new KV cache. Will be fixed with the next PR. To use the cache now, define FALCON_NO_KV_UPGRADE

ehalit · 2023-07-18T06:09:04Z

I downloaded the Falcon 7B instruction fine-tuned model from https://huggingface.co/tiiuae/falcon-7b-instruct and saved it under ggllm.cpp/models/falcon7b_instruct with

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True)

model.save_pretrained("ggllm.cpp/models/falcon7b_instruct")

I manually copy-pasted tokenizer.json into the ggllm.cpp/models/falcon7b_instruct folder. Then, I converted the model with

python falcon_convert.py models/falcon7b_instruct models/7B

I can use the .bin model with falcon_main as I explained before.

ehalit · 2023-07-18T11:45:55Z

If I modify stopwords with -S, the application quits after the stopwords are generated by the model, rather than returning control to the user.

Edit: I guess I found the source of the problem. I only provided the --interactive-first flag which gives the first turn to me but does not allow multiturn conversation. Adding -ins allows multiturn conversation. Feel free to close the issue.

cmp-nct added the bug Something isn't working label Jul 17, 2023

cmp-nct added the finished label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even in interactive mode, multiturn conversation is not possible. #67

Even in interactive mode, multiturn conversation is not possible. #67

ehalit commented Jul 17, 2023

cmp-nct commented Jul 17, 2023 •

edited

Loading

ehalit commented Jul 18, 2023

ehalit commented Jul 18, 2023 •

edited

Loading

Even in interactive mode, multiturn conversation is not possible. #67

Even in interactive mode, multiturn conversation is not possible. #67

Comments

ehalit commented Jul 17, 2023

cmp-nct commented Jul 17, 2023 • edited Loading

ehalit commented Jul 18, 2023

ehalit commented Jul 18, 2023 • edited Loading

cmp-nct commented Jul 17, 2023 •

edited

Loading

ehalit commented Jul 18, 2023 •

edited

Loading