Open
Description
What is the issue?
Deepseek V2, V2.5, and V2-coder all crash with an OOM error when loading the 236b size. Other versions of Deepseek may as well, that's all I've tested. Hardware is dual A6000's with 48GB each.
Error: llama runner process has terminated: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 882903040
llama_new_context_with_model: failed to allocate compute buffers
OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
v0.4.5