Roadmap (short-term) #457
Closed
ggerganov
announced in
Announcements
Replies: 3 comments 2 replies
-
Are these sorted top-priority first? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Can we update the README with usage of the new perplexity tool since the 'main --perplexity' way stopped working? |
Beta Was this translation helpful? Give feedback.
1 reply
-
What's append, is the projet become re activate later ? u_u |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
These will be the priorities for the next few days:
Reduce inference memory usage via ggml scratch buffers, no hardcoded memory buffer sizes and support infinite interactive mode
I know how to fix this and this is important since the GH issues are being flooded with complaints about seg faults and crashes
Finalize SIMD accelerated quantization and merge
ggml
back in the parent repo:quantize_row_q4_0()
quantize_row_q4_1()
dequantize_row_q4_0()
dequantize_row_q4_1()
quantize_row_q4_0()
quantize_row_q4_1()
dequantize_row_q4_0()
dequantize_row_q4_1()
I suspect this could improve performance for prompt batch processing
Deprecate
ggml_vec_mad_xxx()
routines and simplifyggml_forward_mul_mat_xxx()
This should lead to some significant code reduction in
ggml.c
Separate the perplexity computation from
main.cpp
into standalone example program calledperplexity
Move
main.cpp
into a standalone example program and moveutils.h
/utils.cpp
into./examples
to be shared by all examplesAdd
llama_state
to allow parallel text generation sessions with a single modelI will do this in a similar way it is done in
whisper.cpp
Extend
llama_state
to support loading individual model tensors. Needed for LoRA personalities supportAdd 2-bit integer quantization
When the above things are ready we will have a good foundation to start porting more models and create more example applications to demonstrate the usage of
ggml
.New roadmap: #784
Beta Was this translation helpful? Give feedback.
All reactions