-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama.cpp chat example implementation #15
Conversation
Awesome! I have a question about the Does your implementation of main still maintain context between individual messages? I ask because I was running into issues while testing. One of the big concerns I had while testing today was that I couldn't figure out why it wasn't maintaining context, and I definitely don't want to resend the full prompt and message history each time. Other than that, I need to check in with llama.cpp to see where they are on state saving. I know lots of people are interested. Would def be a cool ability!
I do thing it would be a good idea to implement instruction mode and before people get the example. I think it's one of the more heavily use flags between Thanks :) |
This is the llama-cpp-python/examples/low_level_api_chatllama_cpp.py Lines 111 to 124 in 3d96ddf
C++ confuses me sometimes
Definitely, that's also why i made the issue cause i thought that i could have state saving, but alas, it wasn't meant to be.
Its now implemented using |
Good progress so far, I would like to include this in the examples I just request that we follow the import os
import argparse
from dataclasses import dataclass, field
from typing import List, Optional
# Based on https://github.com/ggerganov/llama.cpp/blob/master/examples/common.cpp
@dataclass
class GptParams:
seed: int = -1
n_threads: int = min(4, os.cpu_count() or 1)
n_predict: int = 128
repeat_last_n: int = 64
n_parts: int = -1
n_ctx: int = 512
n_batch: int = 8
n_keep: int = 0
top_k: int = 40
top_p: float = 0.95
temp: float = 0.80
repeat_penalty: float = 1.10
model: str = "models/lamma-7B/ggml-model.bin"
prompt: str = ""
input_prefix: str = ""
antiprompt: List[str] = field(default_factory=list)
memory_f16: bool = True
random_prompt: bool = False
use_color: bool = False
interactive: bool = False
embedding: bool = False
interactive_start: bool = False
instruct: bool = False
ignore_eos: bool = False
perplexity: bool = False
use_mlock: bool = False
mem_test: bool = False
verbose_prompt: bool = False
def gpt_params_parse(argv, params: Optional[GptParams] = None):
if params is None:
params = GptParams()
parser = argparse.ArgumentParser()
parser.add_argument("-s", "--seed", type=int, default=-1, help="")
parser.add_argument("-t", "--threads", type=int, default=1, help="")
parser.add_argument("-p", "--prompt", type=str, default="", help="")
parser.add_argument("-f", "--file", type=str, default=None, help="")
parser.add_argument("-c", "--context_size", type=int, default=512, help="")
parser.add_argument("--memory_f32", action="store_true", help="")
parser.add_argument("--top_p", type=float, default=0.9, help="")
parser.add_argument("--temp", type=float, default=1.0, help="")
parser.add_argument("--repeat_last_n", type=int, default=64, help="")
parser.add_argument("--repeat_penalty", type=float, default=1.0, help="")
parser.add_argument("-b", "--batch_size", type=int, default=8, help="")
parser.add_argument("-m", "--model", type=str, help="")
parser.add_argument(
"-i", "--interactive", action="store_true", help="run in interactive mode"
)
parser.add_argument("--embedding", action="store_true", help="")
parser.add_argument("--interactive-start", action="store_true", help="")
parser.add_argument(
"--interactive-first",
action="store_true",
help="run in interactive mode and wait for input right away",
)
parser.add_argument(
"-ins",
"--instruct",
action="store_true",
help="run in instruction mode (use with Alpaca models)",
)
parser.add_argument(
"--color",
action="store_true",
help="colorise output to distinguish prompt and user input from generations",
)
parser.add_argument("--mlock", action="store_true")
parser.add_argument("--mtest", action="store_true")
parser.add_argument(
"-r",
"--reverse-prompt",
type=str,
default="",
help="run in interactive mode and poll user input upon seeing PROMPT (can be\nspecified more than once for multiple prompts).",
)
parser.add_argument("--perplexity", action="store_true", help="")
parser.add_argument("--ignore-eos", action="store_true", help="")
parser.add_argument("--n_parts", type=int, default=-1, help="")
parser.add_argument("--random-prompt", action="store_true", help="")
parser.add_argument("--in-prefix", type=str, default="", help="")
args = parser.parse_args(argv)
return args Ideally place this into a |
Has some too many newline issues so WIP
Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this common.py file to just the low_level_api folder and remove the init.py files?
Done! |
@SagsMug thank you! |
This commit adds a port of llama.cpp's main function as an example.
It's finally reached a stage where its readable enough for general usage and learning.
There are some differences from the original main since i wanted programmatic I/O.
Future work:
On the first, like the original we just use a list and pop the first element.
Python's deque doesn't support slicing and implementing a custom class seemed out of scope for an example.
It's not the slowest part anyway, since we're waiting for llama most of the time, so its not a high priority.
We can say that that's left as an exercise for the reader 😋
On the second, see #14
Also solves #7, unless we want a higher level interactive mode also