-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : automatic selection of best CPU backend #10606
Conversation
ea35fd8
to
dadab7c
Compare
dadab7c
to
8bfef91
Compare
ggml-ci
c6cfe31
to
854eff8
Compare
I think it may be worth having a cmake flag to build all the common CPU backend variations in a single build rather than having to build multiple times and combining the backend libraries. |
Yes, I agree that would be better. TBH selecting the variants is a headache because there are so many options and each microarchitecture supports a different subset of them, so I didn't want to think too much about it. It might make more sense to build a variant for each microarchitecture, but there is going to be a lot of them. |
Is there an advantage to building a separate binary for each microarchitecture over determining what features can used on runtime? |
That's not really an option for many reasons. It would probably require rewriting all the code in ASM and using a JIT compiler. |
The more variations of the CPU backend compiled with different flags means more duplication of the same code, which would increase the total build size. |
That's pretty much what this is already. The build time and build size is not really significant, the CPU backend builds very quickly and most variants are below 500kB in size. E.g. these are the variants built in #10626:
|
Then all is good. Thanks for explaining :) |
* ggml : automatic selection of best CPU backend * amx : minor opt * add GGML_AVX_VNNI to enable avx-vnni, fix checks
* ggml : automatic selection of best CPU backend * amx : minor opt * add GGML_AVX_VNNI to enable avx-vnni, fix checks
This the way it works:
ggml_backend_score
libggml-cpu-*.so
(orggml-cpu-*.dll
on windows) are checked.The CPU backend implements this functionality for x86-64 and returns a score depending on the features included in the build that are supported on the running system.
The
llama-server
docker image has been updated to include variants for AVX, AVX2, AVX512 and AMX.Caveat: the AVX and AVX2 variants still require FMA and F16C, which will limit the number of processors supported. More variants may be needed to fully support some microarchitectures.