-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limit download speed on pulling new models #2006
Comments
Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet. How about |
Yeah, same here! I'm finding I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either |
I would do the same as
|
I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers. |
I would LOVE to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens. |
Behavior here will be improved by #2221, working on getting that unblocked now |
We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow. |
Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset. |
I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting. |
@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic). |
Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch. |
You might be able to accomplish this with a docker container |
I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency. My Ollama instance is running on MacOS as a native app. I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all. My suggestion as a simple solution that can be implemented quickly:
Then later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters. I hope this can be addressed shortly. |
Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients. |
Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature. |
Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :< |
Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process |
Yes, same here, I can only download models at night. Machine is unusable. |
Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues. |
+1 |
My linux box (i5) got reliably stuck every single time I pulled a model... so +1 for the Two solutions, that did help me limp on for now:
|
I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model:
This negotiates the linkspeed of my network-interface to 10Mbit. Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore. |
I'm preparing a patch and will submit a PR to address this soon. |
My solution for now, works fine. This docker-tc can also simulate package loss 😂 version: '3'
services:
ollama:
image: ollama/ollama
container_name: ollama
ports:
- 11434:11434
restart: unless-stopped
labels:
- "com.docker-tc.enabled=1"
- "com.docker-tc.limit=30mbit"
docker-tc:
image: lukaszlach/docker-tc
cap_add:
- NET_ADMIN
network_mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/docker-tc:/var/docker-tc |
…connections The Ollama server now downloads models using a single connection. This change addresses the root cause of issue ollama#2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed." The environment variable `OLLAMA_DOWNLOAD_CONN` can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities. An entry in the FAQ describes how to use `OLLAMA_DOWNLOAD_CONN` for different use cases. This patch comes with a safe and unproblematic default value. Changes include updates to the `envconfig/config.go`, `cmd/cmd.go`, `server/download.go`, and `docs/faq.md` files.
…ONN setting The Ollama server now downloads models using a single connection. This change addresses the root cause of issue ollama#2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed." The environment variable `OLLAMA_DOWNLOAD_CONN` can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities. An entry in the FAQ describes how to use `OLLAMA_DOWNLOAD_CONN` for different use cases. This patch comes with a safe and unproblematic default value. Changes include updates to the `envconfig/config.go`, `cmd/cmd.go`, `server/download.go`, and `docs/faq.md` files.
@joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB. |
This reliably crashes my router and causes it to restart, it's too fast. |
Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable. It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging itself with the way it downloads files. |
I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago. After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch. How to more forward from here? |
Hi @supercurio (bonjour François), your patch is here: |
Salut @igorschlum 😌 I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly. |
Is it fixed by this? |
It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\(ツ)/¯ |
@Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time. |
@igorschlum no it's changed from 64 to 16, so still a lot of connections |
My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users. I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes. So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of 4 threads sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads! |
Reporting from Türkiye, I am unable to run |
@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. |
@igorschlum Apologies for the lack of details.
However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 ~2 days ago. |
@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing. |
@igorschlum it's 16, which is still very high |
@joelanman Sorry, I anticipated that it could be 4 :-) |
I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize. Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds. |
Hello all, I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority. With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day. |
The issue was fixed 2 weeks ago in 0.3.7... |
I'm still having the issue. |
it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1. |
I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish. Any idea about how to rate limit this on Windows? |
used wondershare to slow down the connection. looks like it is working.... |
@jimbothegrey what might wondershare be? all I'm finding is a file converter |
Still experiencing this on pretty quick connection, knocks all my devices off |
Is there interest in implementing a rate limiter in the
pull
command? I'm open to working on this, this is the syntax I have in mind for now:ollama pull modelname --someflagname 1024
<-- this would limit to 1024 kbpsI took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader.
This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.
The text was updated successfully, but these errors were encountered: