Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: solve network disruption during downloads, add OLLAMA_DOWNLOAD_CONN setting #5683

Closed
wants to merge 2 commits into from

Conversation

supercurio
Copy link

The process of managing bandwidth for model downloads has been an ongoing journey.

The situation left Ollama server with unsafe network concurrency defaults since, causing problems for many users and people sharing the same network, whether they realize Ollama is the origin of their troubles or not.
In the associated issue, users describe in length the problems caused and creative mitigations.
Fortunately, the root cause is simple: 64 concurrent connections, an extremely aggressive value guaranteed to challenge any network congestion algorithm, and the fix is straightforward: opting for 1 concurrent connection by default per model download.
This PR addresses the root cause while adding the ability to configure network concurrency for download if required, via the OLLAMA_DOWNLOAD_CONN setting.
This PR avoids on purpose any complex, ineffective or hard to configure workarounds, like dynamic concurrency adjustments or manual bandwidth limiting.

From the commit associated:
The Ollama server now downloads models using a single connection. This change addresses the root cause of issue #2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed."

The environment variable OLLAMA_DOWNLOAD_CONN can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities.

An entry in the FAQ describes how to use OLLAMA_DOWNLOAD_CONN for different use cases. This patch comes with a safe and unproblematic default value.

Changes include updates to the envconfig/config.go, cmd/cmd.go, server/download.go, and docs/faq.md files.

…ONN setting

The Ollama server now downloads models using a single connection. This change
addresses the root cause of issue ollama#2006 by following best practices instead of
relying on workarounds. Users have been reporting problems associated with
model downloads since January 2024, describing issues such as "hogging the
entire device", "reliably and repeatedly kills my connection", "freezes
completely leaving no choice but to hard reset", "when I download models,
everyone in the office gets a really slow internet", and "when downloading
large models, it feels like my home network is being DDoSed."

The environment variable `OLLAMA_DOWNLOAD_CONN` can be set to control the
number of concurrent connections with a maximum value of 64 (the previous
default, an aggressive value - unsafe in some conditions). The new default
value is 1, ensuring each Ollama download is given the same priority as other
network activities.

An entry in the FAQ describes how to use `OLLAMA_DOWNLOAD_CONN` for different
use cases. This patch comes with a safe and unproblematic default value.

Changes include updates to the `envconfig/config.go`, `cmd/cmd.go`,
`server/download.go`, and `docs/faq.md` files.
docs/faq.md Outdated Show resolved Hide resolved
`ollama serve` instead of `ollama server`

Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com>
@donuts-are-good
Copy link

lgtm

@erkinalp
Copy link
Contributor

doesn't implement the whole of #2006, though

@supercurio
Copy link
Author

supercurio commented Jul 24, 2024

doesn't implement the whole of #2006, though

Correct, and that's intentional.
#2006 describes a problem, but suggests a solution that's far from optimal and only partial: it will not work for variable bandwidth conditions like on mobile networks, unless capping well below the expected minimum, making every download unnecessarily slow.
Since the root cause was not known initially, it was still a reasonable suggestion.

The root cause of the problems reported in #2006 is a wildly excessive amount of 64 default simultaneous connections.
To give context, the official build of the ubiquitous aria2 downloading utility for HTTP and Torrent protocols hardcoded its maximum to only 16 instead.

This fix solves the root cause of the problem while still offering configurability via the OLLAMA_DOWNLOAD_CONN variable.

It would be useful to be able to change runtime parameters like this one, model parallelism, debug status via command line parameters and API calls (like the one for pulling models), however it is out of scope for this fix.

@@ -215,6 +219,23 @@ func LoadConfig() {
}
}

if dlp := clean("OLLAMA_DOWNLOAD_CONN"); dlp != "" {
const minDownloadConnections = 1
const maxDownloadConnections = 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const maxDownloadConnections = 64
const maxDownloadConnections = 1000

Some ollama users have really really fast (multi-gigabit) networks, let them download many parts at once

@mchiang0610
Copy link
Member

Thank you so much for contributing this. We have made some improvements on solving the network connections when downloading models, and lowering the number of connections.

Anyone still seeing consistent problems? Closing this for now, but please feel free to reopen.

@Maltz42
Copy link

Maltz42 commented Nov 27, 2024

Thank you so much for contributing this. We have made some improvements on solving the network connections when downloading models, and lowering the number of connections.

Anyone still seeing consistent problems? Closing this for now, but please feel free to reopen.

I still experience 10-15% packet loss on a gigabit fiber connection during model pulls using v0.4.5.

IMO, one stream is almost always the appropriate number of streams to default to, but I do appreciate the config option allowing more. This commit seems like the right solution to me, over any hard-coded value >1.

@Maltz42
Copy link

Maltz42 commented Dec 2, 2024

@mchiang0610 - Can we re-open this? I'm still seeing that fairly heavy packet loss it in 0.4.7 as well.

@Maltz42
Copy link

Maltz42 commented Dec 10, 2024

Still a problem in 0.5.1 - I'd really like to see this one merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants