Skip to content

Allow building with BLAS/BLIS now that Ollama's runners are not pure native builds of llama.cpp anymore #8402

Open
@hmartinez82

Description

A regression of ~18tks/s to ~8tks/s eval for llama3.2 in a Ryzen Threadripper 1820X.

Up to version v.0.5.1 I was able to build the official llama-server from llama.cpp and use it as part of an Ollama build that skips generation. I'm using AMD's AOCC compiler and AOCL (A BLIS-flavored implementation tunned for AMD cores) on Linux with -march=znver1.

I was building llama-server with (with AOCC and AOCL configured) with:

cmake -G Ninja -B build \
 -DGGML_BLAS=ON \
 -DGGML_BLAS_VENDOR=AOCL_mt \
 -DCMAKE_C_COMPILER=clang \
 -DCMAKE_CXX_COMPILER=clang++ \
 -DGGML_NATIVE=OFF \
 -DLLAMA_BUILD_TESTS:BOOL=0 \
 -DCMAKE_BUILD_TYPE:STRING=Release \
 -DGGML_AVX:BOOL=1 \
 -DGGML_AVX2:BOOL=1 \
 -DGGML_BLAS:BOOL=1  \
 -DGGML_BUILD_EXAMPLES:BOOL=0 \
 -DBUILD_SHARED_LIBS:BOOL=0 \
 -DGGML_NATIVE:BOOL=0 \
 -DGGML_FMA:BOOL=1 \
 -DGGML_F16C:BOOL=1 \ 
 -DGGML_LTO:BOOL=1 \
 -DCMAKE_C_FLAGS:STRING="-march=znver1" \
 -DCMAKE_CXX_FLAGS:STRING="-march=znver1"  \
 -DCMAKE_INSTALL_PREFIX:PATH=/root/llama.cpp/install \
 -DBLAS_INCLUDE_DIRS:PATH=/root/aocl/5.0.0/aocc/include

But now, Ollama doesn't use a pure build of llama.cpp anymore. And to make things worse, it's passing a runner argument that llama-server doesn't accept.

From my understand the runners are now Go applications that link to llama.cpp at build time.

How can I have a custom build of these Go runners that use BLIS and allow me to pass -march=znver1 at build time?

p.s.: I'm not a Go developer :(

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions