You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hardware 3090 X 8
AMD EPYC 7402 24-Core Processor
CUDA 12.2
driver version 535.183.06
running on ollama docker image(0.5.4)
using command ollama run qwq and type some question for testing, crash after three question.
bash reply:
readlink -f /proc/1Error: an error was encountered while running the model: CUDA error: an illegal memory access was encountered
current device: 5, in function ggml_backend_cuda_synchronize at llama/ggml-cuda/ggml-cuda.cu:2317
cudaStreamSynchronize(cuda_ctx->stream())
llama/ggml-cuda/ggml-cuda.cu:96: CUDA error
BTW:
ollama instance occupied about 43 GB VRAM by nvidia-smi
but I got a wrong memory usage by ollama ps
root@c3b740f86306:/# ollama ps
NAME ID SIZE PROCESSOR UNTIL
qwq:latest 46407beda5c0 125 GB 100% GPU 19 minutes from now
OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
0.5.4
The text was updated successfully, but these errors were encountered:
Full server log will help in debugging.
Thank you. I noticed that when I run other models, the ollama ps command also shows incorrect VRAM usage, and similar crashes occur. It might be a hardware or driver issue, so I will close the issue for now. I will update the GPU server and perform the same validation on the new server while collecting complete logs. If the issue persists, I will create a new issue.
What is the issue?
hardware 3090 X 8
AMD EPYC 7402 24-Core Processor
CUDA 12.2
driver version 535.183.06
running on ollama docker image(0.5.4)
using command
ollama run qwq
and type some question for testing, crash after three question.bash reply:
readlink -f /proc/1Error: an error was encountered while running the model: CUDA error: an illegal memory access was encountered
current device: 5, in function ggml_backend_cuda_synchronize at llama/ggml-cuda/ggml-cuda.cu:2317
cudaStreamSynchronize(cuda_ctx->stream())
llama/ggml-cuda/ggml-cuda.cu:96: CUDA error
container log output:
goroutine 158 gp=0xc000443180 m=nil [GC worker (idle)]:
runtime.gopark(0x5b357c2acf3bbd?, 0x1?, 0x37?, 0xd?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00044af38 sp=0xc00044af18 pc=0x556f7a50092e
runtime.gcBgMarkWorker(0xc00015d0a0)
runtime/mgc.go:1412 +0xe9 fp=0xc00044afc8 sp=0xc00044af38 pc=0x556f7a4ae209
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00044afe0 sp=0xc00044afc8 pc=0x556f7a4ae0e5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00044afe8 sp=0xc00044afe0 pc=0x556f7a508561
created by runtime.gcBgMarkStartWorkers in goroutine 21
runtime/mgc.go:1328 +0x105
goroutine 159 gp=0xc000443340 m=nil [GC worker (idle)]:
runtime.gopark(0x5b357c2acf1b00?, 0x1?, 0x15?, 0x62?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00044b738 sp=0xc00044b718 pc=0x556f7a50092e
runtime.gcBgMarkWorker(0xc00015d0a0)
runtime/mgc.go:1412 +0xe9 fp=0xc00044b7c8 sp=0xc00044b738 pc=0x556f7a4ae209
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00044b7e0 sp=0xc00044b7c8 pc=0x556f7a4ae0e5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00044b7e8 sp=0xc00044b7e0 pc=0x556f7a508561
created by runtime.gcBgMarkStartWorkers in goroutine 21
runtime/mgc.go:1328 +0x105
goroutine 161 gp=0xc0004436c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:424 +0xce fp=0xc00044bda8 sp=0xc00044bd88 pc=0x556f7a50092e
runtime.netpollblock(0x556f7a53c158?, 0x7a499186?, 0x6f?)
runtime/netpoll.go:575 +0xf7 fp=0xc00044bde0 sp=0xc00044bda8 pc=0x556f7a4c5697
internal/poll.runtime_pollWait(0x7f9443577e38, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc00044be00 sp=0xc00044bde0 pc=0x556f7a4ffc25
internal/poll.(*pollDesc).wait(0xc000232e80?, 0xc0002045e1?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00044be28 sp=0xc00044be00 pc=0x556f7a555a67
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000232e80, {0xc0002045e1, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc00044bec0 sp=0xc00044be28 pc=0x556f7a5565ba
net.(*netFD).Read(0xc000232e80, {0xc0002045e1?, 0x0?, 0x0?})
net/fd_posix.go:55 +0x25 fp=0xc00044bf08 sp=0xc00044bec0 pc=0x556f7a5ce885
net.(*conn).Read(0xc000246000, {0xc0002045e1?, 0x0?, 0x0?})
net/net.go:189 +0x45 fp=0xc00044bf50 sp=0xc00044bf08 pc=0x556f7a5d8285
net.(*TCPConn).Read(0x0?, {0xc0002045e1?, 0x0?, 0x0?})
:1 +0x25 fp=0xc00044bf80 sp=0xc00044bf50 pc=0x556f7a5e5325
net/http.(*connReader).backgroundRead(0xc0002045d0)
net/http/server.go:690 +0x37 fp=0xc00044bfc8 sp=0xc00044bf80 pc=0x556f7a706077
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc00044bfe0 sp=0xc00044bfc8 pc=0x556f7a705fa5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00044bfe8 sp=0xc00044bfe0 pc=0x556f7a508561
created by net/http.(*connReader).startBackgroundRead in goroutine 58
net/http/server.go:686 +0xb6
BTW:
ollama instance occupied about 43 GB VRAM by nvidia-smi
but I got a wrong memory usage by
ollama ps
root@c3b740f86306:/# ollama ps
NAME ID SIZE PROCESSOR UNTIL
qwq:latest 46407beda5c0 125 GB 100% GPU 19 minutes from now
OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
0.5.4
The text was updated successfully, but these errors were encountered: