Allow building with BLAS/BLIS now that Ollama's runners are not pure native builds of llama.cpp anymore

_A regression of  ~18tks/s to ~8tks/s eval for llama3.2 in a Ryzen Threadripper 1820X._

Up to version `v.0.5.1` I was able to build the official `llama-server` from llama.cpp and use it as part of an Ollama build that skips generation.  I'm using AMD's AOCC compiler and AOCL (A BLIS-flavored implementation tunned for AMD cores) on Linux with `-march=znver1`.

I was building `llama-server` with (with AOCC and AOCL configured) with:
```
cmake -G Ninja -B build \
 -DGGML_BLAS=ON \
 -DGGML_BLAS_VENDOR=AOCL_mt \
 -DCMAKE_C_COMPILER=clang \
 -DCMAKE_CXX_COMPILER=clang++ \
 -DGGML_NATIVE=OFF \
 -DLLAMA_BUILD_TESTS:BOOL=0 \
 -DCMAKE_BUILD_TYPE:STRING=Release \
 -DGGML_AVX:BOOL=1 \
 -DGGML_AVX2:BOOL=1 \
 -DGGML_BLAS:BOOL=1  \
 -DGGML_BUILD_EXAMPLES:BOOL=0 \
 -DBUILD_SHARED_LIBS:BOOL=0 \
 -DGGML_NATIVE:BOOL=0 \
 -DGGML_FMA:BOOL=1 \
 -DGGML_F16C:BOOL=1 \ 
 -DGGML_LTO:BOOL=1 \
 -DCMAKE_C_FLAGS:STRING="-march=znver1" \
 -DCMAKE_CXX_FLAGS:STRING="-march=znver1"  \
 -DCMAKE_INSTALL_PREFIX:PATH=/root/llama.cpp/install \
 -DBLAS_INCLUDE_DIRS:PATH=/root/aocl/5.0.0/aocc/include
```

But now, Ollama doesn't use a pure build of llama.cpp anymore. And to make things worse, it's passing a `runner` argument that `llama-server` doesn't accept.

From my understand the runners are now Go applications that link to llama.cpp at build time.

How can I have a custom build of these Go runners that use BLIS and allow me to pass `-march=znver1` at build time?

p.s.: I'm not a Go developer :(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow building with BLAS/BLIS now that Ollama's runners are not pure native builds of llama.cpp anymore #8402

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development