qwq instance crash #8309

wrpromail · 2025-01-05T10:20:57Z

What is the issue?

hardware 3090 X 8
AMD EPYC 7402 24-Core Processor
CUDA 12.2
driver version 535.183.06
running on ollama docker image(0.5.4)

using command ollama run qwq and type some question for testing, crash after three question.
bash reply:
readlink -f /proc/1Error: an error was encountered while running the model: CUDA error: an illegal memory access was encountered
current device: 5, in function ggml_backend_cuda_synchronize at llama/ggml-cuda/ggml-cuda.cu:2317
cudaStreamSynchronize(cuda_ctx->stream())
llama/ggml-cuda/ggml-cuda.cu:96: CUDA error

container log output:
goroutine 158 gp=0xc000443180 m=nil [GC worker (idle)]:
runtime.gopark(0x5b357c2acf3bbd?, 0x1?, 0x37?, 0xd?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00044af38 sp=0xc00044af18 pc=0x556f7a50092e
runtime.gcBgMarkWorker(0xc00015d0a0)
runtime/mgc.go:1412 +0xe9 fp=0xc00044afc8 sp=0xc00044af38 pc=0x556f7a4ae209
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00044afe0 sp=0xc00044afc8 pc=0x556f7a4ae0e5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00044afe8 sp=0xc00044afe0 pc=0x556f7a508561
created by runtime.gcBgMarkStartWorkers in goroutine 21
runtime/mgc.go:1328 +0x105

goroutine 159 gp=0xc000443340 m=nil [GC worker (idle)]:
runtime.gopark(0x5b357c2acf1b00?, 0x1?, 0x15?, 0x62?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00044b738 sp=0xc00044b718 pc=0x556f7a50092e
runtime.gcBgMarkWorker(0xc00015d0a0)
runtime/mgc.go:1412 +0xe9 fp=0xc00044b7c8 sp=0xc00044b738 pc=0x556f7a4ae209
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00044b7e0 sp=0xc00044b7c8 pc=0x556f7a4ae0e5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00044b7e8 sp=0xc00044b7e0 pc=0x556f7a508561
created by runtime.gcBgMarkStartWorkers in goroutine 21
runtime/mgc.go:1328 +0x105

goroutine 161 gp=0xc0004436c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:424 +0xce fp=0xc00044bda8 sp=0xc00044bd88 pc=0x556f7a50092e
runtime.netpollblock(0x556f7a53c158?, 0x7a499186?, 0x6f?)
runtime/netpoll.go:575 +0xf7 fp=0xc00044bde0 sp=0xc00044bda8 pc=0x556f7a4c5697
internal/poll.runtime_pollWait(0x7f9443577e38, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc00044be00 sp=0xc00044bde0 pc=0x556f7a4ffc25
internal/poll.(*pollDesc).wait(0xc000232e80?, 0xc0002045e1?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00044be28 sp=0xc00044be00 pc=0x556f7a555a67
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000232e80, {0xc0002045e1, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc00044bec0 sp=0xc00044be28 pc=0x556f7a5565ba
net.(*netFD).Read(0xc000232e80, {0xc0002045e1?, 0x0?, 0x0?})
net/fd_posix.go:55 +0x25 fp=0xc00044bf08 sp=0xc00044bec0 pc=0x556f7a5ce885
net.(*conn).Read(0xc000246000, {0xc0002045e1?, 0x0?, 0x0?})
net/net.go:189 +0x45 fp=0xc00044bf50 sp=0xc00044bf08 pc=0x556f7a5d8285
net.(*TCPConn).Read(0x0?, {0xc0002045e1?, 0x0?, 0x0?})
:1 +0x25 fp=0xc00044bf80 sp=0xc00044bf50 pc=0x556f7a5e5325
net/http.(*connReader).backgroundRead(0xc0002045d0)
net/http/server.go:690 +0x37 fp=0xc00044bfc8 sp=0xc00044bf80 pc=0x556f7a706077
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc00044bfe0 sp=0xc00044bfc8 pc=0x556f7a705fa5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00044bfe8 sp=0xc00044bfe0 pc=0x556f7a508561
created by net/http.(*connReader).startBackgroundRead in goroutine 58
net/http/server.go:686 +0xb6

BTW:
ollama instance occupied about 43 GB VRAM by nvidia-smi
but I got a wrong memory usage by ollama ps
root@c3b740f86306:/# ollama ps
NAME ID SIZE PROCESSOR UNTIL
qwq:latest 46407beda5c0 125 GB 100% GPU 19 minutes from now

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.5.4

The text was updated successfully, but these errors were encountered:

rick-github · 2025-01-06T05:35:01Z

Full server log will help in debugging.

wrpromail · 2025-01-07T01:13:06Z

Full server log will help in debugging.
Thank you. I noticed that when I run other models, the ollama ps command also shows incorrect VRAM usage, and similar crashes occur. It might be a hardware or driver issue, so I will close the issue for now. I will update the GPU server and perform the same validation on the new server while collecting complete logs. If the issue persists, I will create a new issue.

wrpromail added the bug Something isn't working label Jan 5, 2025

wrpromail closed this as completed Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwq instance crash #8309

qwq instance crash #8309

wrpromail commented Jan 5, 2025

rick-github commented Jan 6, 2025

wrpromail commented Jan 7, 2025

qwq instance crash #8309

qwq instance crash #8309

Comments

wrpromail commented Jan 5, 2025

What is the issue?

OS

GPU

CPU

Ollama version

rick-github commented Jan 6, 2025

wrpromail commented Jan 7, 2025