-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: solve network disruption during downloads, add OLLAMA_DOWNLOAD_CONN setting #5683
Conversation
…ONN setting The Ollama server now downloads models using a single connection. This change addresses the root cause of issue ollama#2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed." The environment variable `OLLAMA_DOWNLOAD_CONN` can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities. An entry in the FAQ describes how to use `OLLAMA_DOWNLOAD_CONN` for different use cases. This patch comes with a safe and unproblematic default value. Changes include updates to the `envconfig/config.go`, `cmd/cmd.go`, `server/download.go`, and `docs/faq.md` files.
`ollama serve` instead of `ollama server` Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com>
lgtm |
doesn't implement the whole of #2006, though |
Correct, and that's intentional. The root cause of the problems reported in #2006 is a wildly excessive amount of 64 default simultaneous connections. This fix solves the root cause of the problem while still offering configurability via the OLLAMA_DOWNLOAD_CONN variable. It would be useful to be able to change runtime parameters like this one, model parallelism, debug status via command line parameters and API calls (like the one for pulling models), however it is out of scope for this fix. |
@@ -215,6 +219,23 @@ func LoadConfig() { | |||
} | |||
} | |||
|
|||
if dlp := clean("OLLAMA_DOWNLOAD_CONN"); dlp != "" { | |||
const minDownloadConnections = 1 | |||
const maxDownloadConnections = 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const maxDownloadConnections = 64 | |
const maxDownloadConnections = 1000 |
Some ollama users have really really fast (multi-gigabit) networks, let them download many parts at once
Thank you so much for contributing this. We have made some improvements on solving the network connections when downloading models, and lowering the number of connections. Anyone still seeing consistent problems? Closing this for now, but please feel free to reopen. |
I still experience 10-15% packet loss on a gigabit fiber connection during model pulls using v0.4.5. IMO, one stream is almost always the appropriate number of streams to default to, but I do appreciate the config option allowing more. This commit seems like the right solution to me, over any hard-coded value >1. |
@mchiang0610 - Can we re-open this? I'm still seeing that fairly heavy packet loss it in 0.4.7 as well. |
Still a problem in 0.5.1 - I'd really like to see this one merged. |
The process of managing bandwidth for model downloads has been an ongoing journey.
The situation left Ollama server with unsafe network concurrency defaults since, causing problems for many users and people sharing the same network, whether they realize Ollama is the origin of their troubles or not.
In the associated issue, users describe in length the problems caused and creative mitigations.
Fortunately, the root cause is simple: 64 concurrent connections, an extremely aggressive value guaranteed to challenge any network congestion algorithm, and the fix is straightforward: opting for 1 concurrent connection by default per model download.
This PR addresses the root cause while adding the ability to configure network concurrency for download if required, via the
OLLAMA_DOWNLOAD_CONN
setting.This PR avoids on purpose any complex, ineffective or hard to configure workarounds, like dynamic concurrency adjustments or manual bandwidth limiting.
From the commit associated:
The Ollama server now downloads models using a single connection. This change addresses the root cause of issue #2006 by following best practices instead of relying on workarounds. Users have been reporting problems associated with model downloads since January 2024, describing issues such as "hogging the entire device", "reliably and repeatedly kills my connection", "freezes completely leaving no choice but to hard reset", "when I download models, everyone in the office gets a really slow internet", and "when downloading large models, it feels like my home network is being DDoSed."
The environment variable
OLLAMA_DOWNLOAD_CONN
can be set to control the number of concurrent connections with a maximum value of 64 (the previous default, an aggressive value - unsafe in some conditions). The new default value is 1, ensuring each Ollama download is given the same priority as other network activities.An entry in the FAQ describes how to use
OLLAMA_DOWNLOAD_CONN
for different use cases. This patch comes with a safe and unproblematic default value.Changes include updates to the
envconfig/config.go
,cmd/cmd.go
,server/download.go
, anddocs/faq.md
files.