Rate limit download speed on pulling new models #2006

donuts-are-good · 2024-01-15T15:02:22Z

Is there interest in implementing a rate limiter in the pull command? I'm open to working on this, this is the syntax I have in mind for now:

ollama pull modelname --someflagname 1024 <-- this would limit to 1024 kbps

I took a look at the code in server/download.go, and I think I can do this with the x/time/rate applied to the downloadChunk method of the blob downloader.

This feature, or something like this that accomplishes this would be quite useful for me. Ollama is able to saturate my network faster than bit torrent or anything else I've tried.

The text was updated successfully, but these errors were encountered:

tkafka · 2024-01-23T08:39:46Z

Yes, definitely! Same here - when I download models, everyone in the office gets a really slow internet.

How about rate-limit?

jukofyork · 2024-01-24T11:39:37Z

Yeah, same here!

I'm finding ollama pull is really killing my connection and i have to limit myself to just using it at night now...

I assume it's using multiple threads to download multiple chunks at the same time or something, as it seems a lot more lag-inducing than either wget or curl? If so, then it might be good to have control over these parameter(s) too.

escaroda · 2024-01-31T17:33:34Z

I would do the same as wget:

‘--limit-rate=amount’

Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix. For example, ‘--limit-rate=20k’ will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don’t want Wget to consume the entire available bandwidth.

This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, ‘--limit-rate=2.5k’ is a legal value.

Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don’t be surprised if limiting the rate doesn’t work well with very small files.

easp · 2024-02-01T17:37:58Z

I think this is coming. I saw either a branch or a pull request to provide rate limiting by one of the maintainers.

akulbe · 2024-02-16T15:52:37Z

I would LOVE to see this implemented. It reliably and repeatedly kills my connection, on anything larger than a 13b model. I think it's the sustained speed (I have a 1G/1G connection, and downloads get up to 115M/s) when it happens.

BruceMacD · 2024-02-16T16:46:24Z

Behavior here will be improved by #2221, working on getting that unblocked now

donuts-are-good · 2024-02-17T01:35:36Z

We want to define an arbitrary download speed limit. It'd be great if #2221 could address that somehow.

pablo-01 · 2024-03-17T01:05:25Z

Had this issue every day since starting using Ollama few days ago. In my case - Pop!_OS 22.04 LTS - freezes randomly and eventually freezes completely leaving no choice but to hard reset.

simmonsm · 2024-03-20T12:34:13Z

I agree. This is definitely a good idea as, for instance, pulling down the 7b 39Gb model without rate limiting is very antisocial network behaviour. I did install and play with the trickle command but could't figure out how to use it with the run olllama command as it isn't that process that needs limiting.

fermuch · 2024-03-20T12:40:54Z

@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

simmonsm · 2024-03-20T13:27:00Z

@simmonsm I think trickle wouldn't work anyways since go doesn't use libc (and trickle uses LD_PRELOAD for its magic).

Fair enough. In the meantime I'm using a VM connected via a virtual traffic shaping network switch.

LagSlug · 2024-04-14T05:06:10Z

You might be able to accomplish this with a docker container

https://stackoverflow.com/questions/25497523/how-can-i-rate-limit-network-traffic-on-a-docker-container

supercurio · 2024-04-19T10:50:11Z

I'm downloading a bunch of Llama 3 models at the moment and last night my upstairs neighbor, which whom I'm sharing a 300/100 fiber connection asked for help because he couldn't use the internet anymore. Indeed, I ran a speedtest on another machine connected over Ethernet and the bandwidth left was 1.6Mbit/s download, with whopping ~1000ms of ping latency.

My Ollama instance is running on MacOS as a native app.
For now I found a workaround for my neighbor, using XCode's Network Link Conditioner, but I'm still essentially unable to browse the web on my primary machine when pulling models.

I appreciate that Ollama maximizes the bandwidth to download large models as quickly as possible, but the default behavior does not run with sane parameters at all.

My suggestion as a simple solution that can be implemented quickly:

run by default with conservative settings to be a good network citizen. 2, maybe 3. Not more than that.
mention a more aggressive preset as command-line argument when downloading via ollama pull or ollama run
expose a custom "max concurrent download connections" parameter on command line and API.

Then later, sure - an adaptive algorithm can try to optimize the concurrent connection count based on latency and throughput. But it might never work that great on shared and mobile connections, where the available bandwidth and latency vary based on external parameters.
This github issue suggests a rate limit which would be helpful as well, but selecting an appropriate amount of concurrent connections should do the trick just fine without resorting to manual tuning.
If Ollama is competing with something else trying to use bandwidth, like a neighbor trying to watch Netflix, it should not try to hack TCP's congestion control standards to get all the bandwidth and respect them instead.

I hope this can be addressed shortly.

mcraveiro · 2024-05-19T09:27:42Z

Been hit by this issue when downloading large models, literally hogs the entire device. Would be nice to be able to limit the rate in someway a-la bittorrent clients.

strangehelix · 2024-05-24T21:34:43Z

Same issue. Downloader uses up the entire bandwidth (the internet becomes unusable) and eventually crashes because of the timeouts. Rate-limiter seems like a critically important feature.

FeyNyXx · 2024-05-28T08:49:33Z

Same issue. I'm killing my (not exactly mine, but I have to deal with it...) router using ollama pull :<

metamec · 2024-05-30T08:34:42Z

Love Ollama but this is murdering the end user experience for me. I'm having to ctrl+c just to post this comment. I have a 2Gb/s connection too. It's not a limited bandwidth issue. It's simply downloading too many chunks simultaneously, deprioritising internet bandwidth to every other process ~~on the system~~. (Just realised it's network-wide, not system-wide. When downloading large models, it feels like my home network is being DDoSed.)

mcraveiro · 2024-05-30T10:02:19Z

Yes, same here, I can only download models at night. Machine is unusable.

MihailCosmin · 2024-05-30T13:18:37Z

Had same problem, at work, not only my computer was getting slow but also all the internet for all other colleagues.
I had to find a solution to rate limit download speed, older solutions (wondershaper, trickle or tc) did not work for me. The only one that worked was FireQOS, in case anyone else needs it.

LutzFassl · 2024-06-27T12:58:55Z

+1

robins · 2024-07-06T13:45:31Z

My linux box (i5) got reliably stuck every single time I pulled a model... so +1 for the --rate-limit feature.

Two solutions, that did help me limp on for now:

As soon as I started the fetch, I used iotop to change the ionice priority (using i) to idle. That made the issue completely go away in that, although the downloads were still fast, the linux system was quite usable. However, this was still frustrating since one had to type the PIDs when trying to set ionice for them (and there were a few)!
Now since Ollama spun up multiple downloads, the ionice tool didn't work for me - IIUC that's because ionice needs to be run for each process. So it ended up being far simpler to just get the parent PID, and then set ionice for each of the child processes, each time I was downloading a model.

pid=`ps -ef | grep "ollama run" | awk '{print $2}'`
sudo ionice -c3 -p `ps -T -p $pid | awk '{print $2}' | grep -v SPID | tr '\r\n' ' '`

treibholz · 2024-07-12T15:58:26Z

I use this horrible "workaround", to not consume the whole internet bandwidth, so I can still work on my other machine, while pulling a model:

$ sudo ethtool -s eth0  autoneg on speed 10 duplex full

This negotiates the linkspeed of my network-interface to 10Mbit.

Yes, the pulling machine itself is not useable as well and sometimes the download is interrupted, because (and I'm not kidding you!) there is not enough bandwidth left for DNS. But at least others on my local network are not angry anymore.

supercurio · 2024-07-13T16:48:52Z

I'm preparing a patch and will submit a PR to address this soon.

Netzvamp · 2024-07-13T21:41:24Z

My solution for now, works fine. This docker-tc can also simulate package loss 😂

version: '3'
services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    ports:
      - 11434:11434
    restart: unless-stopped
    labels:
      - "com.docker-tc.enabled=1"
      - "com.docker-tc.limit=30mbit"

  docker-tc:
    image: lukaszlach/docker-tc
    cap_add:
      - NET_ADMIN
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/docker-tc:/var/docker-tc

…connections The Ollama server now downloads models using a single connection. This change addresses the root cause of issue ollama#2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed." The environment variable `OLLAMA_DOWNLOAD_CONN` can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities. An entry in the FAQ describes how to use `OLLAMA_DOWNLOAD_CONN` for different use cases. This patch comes with a safe and unproblematic default value. Changes include updates to the `envconfig/config.go`, `cmd/cmd.go`, `server/download.go`, and `docs/faq.md` files.

…ONN setting The Ollama server now downloads models using a single connection. This change addresses the root cause of issue ollama#2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed." The environment variable `OLLAMA_DOWNLOAD_CONN` can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities. An entry in the FAQ describes how to use `OLLAMA_DOWNLOAD_CONN` for different use cases. This patch comes with a safe and unproblematic default value. Changes include updates to the `envconfig/config.go`, `cmd/cmd.go`, `server/download.go`, and `docs/faq.md` files.

igorschlum · 2024-08-11T23:00:43Z

@joelanman I agree with you. From the same location, I was able to download a 3K model but not the 405b model. For the 3K model, the 3GB were downloaded without any connection issues, whereas with the 405b model, the download kept stopping after about every 200MB.

ShayBox · 2024-08-17T14:35:18Z

This reliably crashes my router and causes it to restart, it's too fast.

numbermaniac · 2024-08-19T11:07:00Z

I'm having to ctrl+c just to post this comment.

Same here. I literally can't even search Google while it's downloading something. I can download files that are multiple gigabytes in my web browser or in macOS Homebrew and still use the internet just fine, but when Ollama is downloading a model, my entire internet becomes unusable.

It doesn't even work well for Ollama either because the download speed starts at around 6MB/s and then just keeps gradually reducing, going down to like 1MB/s, then 500 KB/s, then 350 KB/s, gradually getting slower and slower, so Ollama is practically sabotaging itself with the way it downloads files.

supercurio · 2024-08-19T12:22:57Z

I wonder how to get some progress going on this issue: the description of how badly it affects the user experience should bump its priority to critical in my opinion, and I already submitted a PR with a fix a month ago.

After testing Ollama on a high bandwidth VPS, I appreciated that its aggressive strategy lead to 300MB/s (mega bytes) downloads, so I get the point on keeping the many concurrent connections capability available. Which is the case with my patch.

How to more forward from here?
Routers crashing, people unable to use their computers: that's not the expected result when using any kind of software.

igorschlum · 2024-08-19T12:53:31Z

Hi @supercurio (bonjour François), your patch is here:
#5683
I know that there were issues that were more important during those weeks, like CUDA fixes, Memory and function calling.
Ollama can run easily without this patch, but if it's just a matter of approving your patch, @jmorganca could decide to do it.

supercurio · 2024-08-19T13:09:55Z

Salut @igorschlum 😌
All of Ollama's the core functionalities are important, that's for sure.
Downloading model(s) is still the first action every Ollama user will take.

I solved for my individual use case already but I'm hoping to use Ollama as LLM runtime for the app I'm developing at the moment. It's a non-starter until this issue is solved sadly.
Fortunately, llamafile provides a good enough alternative in that case.

joelanman · 2024-08-20T19:05:58Z

Is it fixed by this?

server: reduce max connections used in download #6347

Fluffkin · 2024-08-20T19:30:58Z

It'll help in many cases, but it's probably still too high for some domestic broadband users with low bandwidth. ¯\(ツ)/¯

igorschlum · 2024-08-20T21:36:35Z

@Fluffkin I think it would work for any type of connection, because actually Ollama is downloading 64 files simultaneously and letting very little bandwidth for other computers. With this modification, it will use only one download at a time.

joelanman · 2024-08-20T22:46:30Z

@igorschlum no it's changed from 64 to 16, so still a lot of connections

robins · 2024-08-20T23:54:14Z

My 2c. Unless there a way for customers to "request" for more speed, the default shouldn't be hurting low-end users.

I'd take Git's example here. Albeit downloads are not split (like this tool), but when churning through the repo - even on a 20 core machine - it doesn't spin up an obscene number of processes - only 4 processes.

So yes, we shouldn't cut down from 64 to 1, just to accommodate everyone - but settling for a saner default of 4 threads sounds like a good middle-ground here.... If / when there's a feature to request for more threads, customers with more resources can always go back to 64 concurrent threads!

mrtysn · 2024-09-02T14:25:28Z

Reporting from Türkiye, I am unable to run ollama pull during the day due to it causing nearly all other connections on my shared Wi-Fi network to almost come to a halt. A download speed rate limit would be greatly appreciated.

igorschlum · 2024-09-02T14:38:26Z

@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads.
What version of Ollama do you use? On which OS?

mrtysn · 2024-09-02T14:52:44Z

@mrtysn what version of Ollama are you using? I agree with @supercurio that a parameter could be added to set the number of concurrent downloads. What version of Ollama do you use? On which OS?

@igorschlum Apologies for the lack of details.

I am on a MacBook Pro M2 Max with Sonoma 14.6.1.
My ollama is installed from homebrew, and it is currently on version 0.3.9.

However, I believe I did ~90% of my model pulls while on version 0.3.8, the homebrew formula was updated to 0.3.9 ~2 days ago.

igorschlum · 2024-09-02T15:31:29Z

@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

joelanman · 2024-09-02T15:50:43Z

@igorschlum it's 16, which is still very high

igorschlum · 2024-09-02T16:06:34Z

@joelanman Sorry, I anticipated that it could be 4 :-)

mrtysn · 2024-09-03T18:23:02Z

@mrtysn I installed Go on my Mac and was able to build Ollama from the source. If you'd like, I can create a tutorial on how to do this from scratch on a Mac. Then, you could change the 4 simultaneous downloads to 1 to see if this fixes the issue you're facing.

I've installed the pre-requisites. However, unfortunately, I will not be able to attend more to this issue with my current workload. Scheduling model downloads for the night has been an okay workaround so far, and I currently have most of the models that I would like to utilize.

Nevertheless, I still think a CLI flag with reasonable defaults to control the number of concurrent download threads would be helpful for all ollama users, especially new users who would be downloading models for the first time or users from regions with slower internet speeds.

donuts-are-good · 2024-09-05T14:19:45Z

Hello all,

I appreciate everyone's input in this thread, and I do hope this eventually gets solved. When I initially created this issue as a place to discuss and outline a solution to the network saturation issue, I assumed it would be welcome with open arms. In the time since, the software has got bigger and more complex and it's no longer a simple solution, and doesn't appear to be a priority.

With that in mind, life must go on. I'm withdrawing the offer to implement this as I no longer have the resources to write and test a feature that isn't a concern of the developers. I don't think anybody's going to worry about it, but I wanted to be clear in my intentions. I look forward to seeing this issue fixed some day.

ShayBox · 2024-09-05T15:30:32Z

The issue was fixed 2 weeks ago in 0.3.7...

mdlmarkham · 2024-09-07T20:17:15Z

I'm still having the issue.

devrandom · 2024-10-06T18:40:28Z

it would be best if the number of threads was configurable. most of these issues can be mitigated by setting the number of concurrent downloads to 1.

augusto-rehfeldt · 2024-12-18T19:30:27Z

I'm still having this issue. I'm from Argentina and 16 concurrent connections kills my Internet connection, hangs my machine, and downloads over 1GB never finish.

Any idea about how to rate limit this on Windows?

jimbothegrey · 2025-01-07T16:23:38Z

used wondershare to slow down the connection. looks like it is working....

mrtysn · 2025-01-10T03:03:41Z

used wondershare to slow down the connection. looks like it is working....

@jimbothegrey what might wondershare be? all I'm finding is a file converter

TiddlyWiddly · 2025-01-12T20:08:39Z

Still experiencing this on pretty quick connection, knocks all my devices off

pdevine added the networking Issues relating to ollama pull and push label Mar 12, 2024

pdevine assigned mxyng Mar 12, 2024

erenaslandev mentioned this issue Mar 16, 2024

feat: http client bandwidth limit for pulling and pushing #3183

Closed

supercurio mentioned this issue Jul 13, 2024

fix: solve network disruption during downloads, add OLLAMA_DOWNLOAD_CONN setting #5683

Closed

jmorganca mentioned this issue Aug 11, 2024

Please accept slow network connections when loading models #3741

Closed

igorschlum mentioned this issue Aug 11, 2024

Error: max retries exceeded #6211

Closed

This was referenced Sep 24, 2024

An operation on a socket could not be performed #6869

Closed

pulling a big model can freeze the cpu #6881

Closed

Rate limit download speed on pulling new models #2006

Rate limit download speed on pulling new models #2006

Comments

donuts-are-good commented Jan 15, 2024

tkafka commented Jan 23, 2024

jukofyork commented Jan 24, 2024

escaroda commented Jan 31, 2024

easp commented Feb 1, 2024

akulbe commented Feb 16, 2024

BruceMacD commented Feb 16, 2024

donuts-are-good commented Feb 17, 2024

pablo-01 commented Mar 17, 2024

simmonsm commented Mar 20, 2024

fermuch commented Mar 20, 2024 • edited Loading

simmonsm commented Mar 20, 2024

LagSlug commented Apr 14, 2024

supercurio commented Apr 19, 2024

mcraveiro commented May 19, 2024

strangehelix commented May 24, 2024

FeyNyXx commented May 28, 2024

metamec commented May 30, 2024 • edited Loading

mcraveiro commented May 30, 2024

MihailCosmin commented May 30, 2024

LutzFassl commented Jun 27, 2024

robins commented Jul 6, 2024 • edited Loading

treibholz commented Jul 12, 2024

supercurio commented Jul 13, 2024

Netzvamp commented Jul 13, 2024

igorschlum commented Aug 11, 2024

ShayBox commented Aug 17, 2024

numbermaniac commented Aug 19, 2024

supercurio commented Aug 19, 2024

igorschlum commented Aug 19, 2024

supercurio commented Aug 19, 2024 • edited Loading

joelanman commented Aug 20, 2024

Fluffkin commented Aug 20, 2024 • edited Loading

igorschlum commented Aug 20, 2024

joelanman commented Aug 20, 2024

robins commented Aug 20, 2024 • edited Loading

mrtysn commented Sep 2, 2024

igorschlum commented Sep 2, 2024

mrtysn commented Sep 2, 2024 • edited Loading

igorschlum commented Sep 2, 2024

joelanman commented Sep 2, 2024

igorschlum commented Sep 2, 2024 • edited Loading

mrtysn commented Sep 3, 2024

donuts-are-good commented Sep 5, 2024

ShayBox commented Sep 5, 2024

mdlmarkham commented Sep 7, 2024

devrandom commented Oct 6, 2024

augusto-rehfeldt commented Dec 18, 2024 • edited Loading

jimbothegrey commented Jan 7, 2025

mrtysn commented Jan 10, 2025

TiddlyWiddly commented Jan 12, 2025

fermuch commented Mar 20, 2024 •

edited

Loading

metamec commented May 30, 2024 •

edited

Loading

robins commented Jul 6, 2024 •

edited

Loading

supercurio commented Aug 19, 2024 •

edited

Loading

Fluffkin commented Aug 20, 2024 •

edited

Loading

robins commented Aug 20, 2024 •

edited

Loading

mrtysn commented Sep 2, 2024 •

edited

Loading

igorschlum commented Sep 2, 2024 •

edited

Loading

augusto-rehfeldt commented Dec 18, 2024 •

edited

Loading