Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: reduce max connections used in download #6347

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

bmizerany
Copy link
Contributor

The previous value of 64 was WAY too high and unnecessary. It reached diminishing returns and blew past it. This is a more reasonable number for most normal cases. For users on cloud servers with excellent network quality, this will keep screaming for them, without hitting our CDN limits. For users with relatively poor network quality, this will keep them from saturating their network and causing other issues.

The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
@bmizerany
Copy link
Contributor Author

bmizerany commented Aug 13, 2024

Preemptively: With regard to configuration via some new environment variable, it seems best to "wait and see." to avoid adding more support burden for more configuration parameters if only very few if anyone will need it.

@MaxJa4
Copy link

MaxJa4 commented Aug 13, 2024

Happy to test this to gather some quantitative metrics if needed (different servers/locations and local machines). In that case, just ping me.
Thanks for taking a look at the issue :) Curious to see how far 4 connections can go regarding performance.

server/download.go Outdated Show resolved Hide resolved
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
@bmizerany
Copy link
Contributor Author

Happy to test this to gather some quantitative metrics if needed (different servers/locations and local machines). In that case, just ping me. Thanks for taking a look at the issue :) Curious to see how far 4 connections can go regarding performance.

Please! We would love any feedback/reports here.

@bmizerany bmizerany merged commit 8e1050f into main Aug 13, 2024
12 checks passed
@bmizerany bmizerany deleted the bmizerany/lowerdownloadnumparts branch August 13, 2024 23:47
@MaxJa4
Copy link

MaxJa4 commented Aug 15, 2024

Already merged, sorry for the late reply, but since I'm curious about it myself, here's a quick non-scientific benchmark on different systems regardless.
cc @jmorganca @bmizerany


"Before" = Ollama 0.3.6
"After" = Ollama Main including this PR (8200c37)
Using llama3.1:70b and mistral-large as test models.
Used docker images for testing.

Test results:

Domestic internet uplink (250 Mbit/s)

Goal: Testing for the underlying issue of remaining bandwidth for other applications.

Before: ~3 MBit/s
After: ~13 MBit/s
Full download speed was reached with both.

Server internet uplink (mixed speeds)

Goal: Testing for max. download speed. Some fluctuations expected between runs.
Note: The "speedtest" data is actual, real download speed to the fastest nearby speed test server (not Cloudflare).

Cloud provider 1 (8 vCPUs, identical machine hardware)

Region Speedtest Max. Before Max. After
Germany 1.2 GB/s 0.67 GB/s 0.50 GB/s
Finland 1.6 GB/s 0.68 GB/s 0.70 GB/s
Finland* 1.3 GB/s 0.55 GB/s 0.58 GB/s
US West 1.0 GB/s 0.72 GB/s 0.65 GB/s
US East 1.1 GB/s 0.70 GB/s 0.69 GB/s
Singapur 1.2 GB/s 0.90 GB/s 0.41 GB/s

* larger machine with 16 vCPUs

Cloud provider 2 (9 vCPUs)

Region Speedtest Max. Before Max. After
Sweden 2.0 GB/s 0.60 GB/s 0.43 GB/s
Sweden* 1.9 GB/s 1.1 GB/s 0.46 GB/s

* larger machine with 36 vCPUs


Overall, with few exceptions, only cases with speeds over 600 MB/s will have slower downloads with the reduced connection count of 16 now compared to the previous 64. This may heavily depend on used internet connection and hardware configuration at this point.
Since the goal was to reduce the burden on slower (private/domestic) internet connections, it seems like a good compromise.
Thanks for the "fix" :)

@bmizerany
Copy link
Contributor Author

@MaxJa4 This is very helpful, thank you!!

We'll continue to keep an eye on how it affects others and make changes as needed.

Again, this is super helpful. Thank you so much.

deep93333 pushed a commit to deep93333/ollama that referenced this pull request Sep 9, 2024
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants