-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved push and pull with upload manager and download manager #18353
Conversation
It doesn't change design really. I think we can move to code-review. |
agreed |
@@ -731,7 +735,8 @@ func NewDaemon(config *Config, registryService *registry.Service) (daemon *Daemo | |||
return nil, err | |||
} | |||
|
|||
distributionPool := distribution.NewPool() | |||
d.downloadManager = xfer.NewLayerDownloadManager(d.layerStore, 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 5 should be a const. Also why 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should probably be defined in a constant.
The number 5 is arbitrary and just seemed like a sane number of simultaneous transfers. It could be made tunable if people think it's worth the effort.
08d2bb8
to
0d0c39e
Compare
I did some benchmarks comparing this PR with the existing code on master. The numbers were almost exactly the same. I pulled golang:latest repeatedly from the hub, and from a local registry. Using the hub, the average pull time with this patch was 27.55 seconds, versus 27.74 with master. Using a local registry, the average pull time with this patch was 20.21 seconds, versus 20.01 seconds with master. In both cases, the runtime is dominated by registering the layers, which can't be parallelized. The patched version could be somewhat faster in some cases, because it prioritizes downloading the bottom layers so registration can start earlier. My internet connection is fast enough that this doesn't show up much in my benchmark. With a slower connection, I think it could make a big difference. This patch wraps downloads and registrations with an |
0d0c39e
to
f649b31
Compare
After some experimentation, I've found that changing the download concurrency limit from 5 to 3 helps with slower links. I've made this change. I used Network Link Conditioner to simulate a 40 mbps network connection. Pulling wordpress:latest with the latest version of the PR takes 47.06s, vs 52.16s on master. Pulling golang:latest takes 62.42s, vs 66.19s on master. The differences here are caused by fewer simulatenous downloads allowing the base layers to complete first. Then they can be unpacked and registered while other layers are still downloading. |
logrus.Errorf("Failed to remove temp file: %s", tmpFile.Name()) | ||
} | ||
|
||
return nil, 0, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause a retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The thinking is that if we receive something that doesn't match the digest, it could have been corrupted in transit, for example by a misbehaving proxy, or a broken server in a load balancing rotation. Retrying has a chance of getting the expected content.
There is also an argument to be made that if we don't get the right data the first time, we aren't likely to get it after a retry either.
I'm not sure what makes most sense in this case.
4d4d073
to
3d4f50e
Compare
I've added another commit that cleans up the progress monitoring. It unifies the channel-based ProgressReader under If people feel like this is the right direction, I'll squash the commits before merge. |
3d4f50e
to
8059796
Compare
Updated the new commit after some discussion with @tonistiigi. Now the common |
8059796
to
317e99c
Compare
I think previous implementation of progress was cleaner. It was so nice to have progress without crappy streamformatter. |
77e3f2a
to
7a37cee
Compare
02567ea
to
d92e7ad
Compare
thanks! docs LGTM ping @moxiegirl @SvenDowideit for review |
I hear this is expedited work. I'll LGTM this and create a carry PR. |
@moxiegirl thanks |
Improved push and pull with upload manager and download manager
Concurrent uploads which share layers worked correctly as of moby#18353, but unfortunately moby#18785 caused a regression. This PR removed the logic that shares digests between different push sessions. This overlooked the case where one session was waiting for another session to upload a layer. This commit adds back the ability to propagate this digest information, using the distribution.Descriptor type because this is what is received from stats and uploads, and also what is ultimately needed for building the manifest. Surprisingly, there was no test covering this case. This commit adds one. It fails without the fix. See recent comments on moby#9132. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This feature has completely broken docker for my use case - I can no longer upload from my home computer, as large layers compete for limited bandwidth and time each other out. Is there a way to disable this behavior at the command line? |
@dts Without more information, it is unlikely that this PR is the root cause of your particular issue without completing an analysis. Typically, reporting issues on merged PRs is an ineffective venue for solving your problem. I'd recommend submitting another issue or reaching out the mailing list for help. Submit the issue and reference this as a possible cause. However, please ensure that you separate the description of the problem from the perceived cause, as it may be another cause. |
@dts No problem. Make sure to fill in that issue with as many details as possible. At least, it should include the docker version and the exact output when the error is encountered. From there, I'd recommend including logs from daemon in debug mode, as that will provide clues as to how many simultaneous streams are starting and their actual scope. |
@stevvooe - It's more of a feature request for me - I'd like to reduce the concurrency of sending images, does having my docker version and output text help with that? |
@dts Its preferable to understand the root cause of the issue rather than adding another feature that may or may not address your problem. This PR is not really a feature and fixes to its behavior would be a feature, either. |
@stevvooe makes sense - context is added to the fresh new issue, thanks again for your help! |
Concurrent uploads which share layers worked correctly as of moby#18353, but unfortunately moby#18785 caused a regression. This PR removed the logic that shares digests between different push sessions. This overlooked the case where one session was waiting for another session to upload a layer. This commit adds back the ability to propagate this digest information, using the distribution.Descriptor type because this is what is received from stats and uploads, and also what is ultimately needed for building the manifest. Surprisingly, there was no test covering this case. This commit adds one. It fails without the fix. See recent comments on moby#9132. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com> (cherry picked from commit 5c99eeb)
This commit adds a transfer manager which deduplicates and schedules
transfers, and also an upload manager and download manager that build on
top of the transfer manager to provide high-level interfaces for uploads
and downloads. The push and pull code is modified to use these building
blocks.
Some benefits of the changes:
cancelled if all pushes or pulls using them are cancelled.
conventions (i.e. streamformatter), which will make it easier to split
out.
This commit also includes unit tests for the new distribution/xfer
package. The tests cover 87.8% of the statements in the package.
fixes #18498