-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl apply should retry stale resource version. #15493
Comments
/cc @jayunit100 fyi. |
From |
@jackgr What is the mismatch here? If it is just resourceVersion, kubectl should retry. It should only bail on a meaningful conflict. We're working on eliminating flaky tests (and flaky parts of the system). Also, that log message isn't very useful. |
@bgrant0607 per discussion on #16543, we can't determine the correct behavior when the resource version changes without more information, so best to catch version mismatch at the client and bail, and let the user replace the object with --force if they want. We can also make the error message less verbose. |
@jackgr I don't see any mention of resourceVersion in #16543. It shouldn't be serialized in the annotation, because the user shouldn't specify it in their config. The resourceVersion-based precondition is intended to guard against concurrent read-modify-write operations. kubectl apply should be able retry the entire diff&patch in the case of this type of conflict. That said, it's not obvious to me from the log message what the actual problem was in this case. |
I'm sorry, my mistake. You're right that resourceVersion is not in the annotation, and that we should retry on resourceVersion failure. I was confusing this with the api version change issue discussed in #16543. This is actually a much simpler issue, in the sense that it's an optimistic lock conflict. If we encounter a lock conflict, then we should fetch a fresh copy of the object, and try again to apply the new configuration supplied by the user.
Does this fit your expectation of how it should be handled? |
This is causing test flake:
|
@jackgr yes, your understanding is correct. Do you have time to fix this? We're trying to get all our tests fixed. Additionally this sounds like a legit system bug, not just a flaky test problem. |
@lavalamp Maybe. Swamped right now, but can fit it in given enough notice. When is the fix needed? |
@jackgr asap. We'll find someone else. |
This should just be a generic GET/Update retry loop, eg: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller_utils.go#L53 |
Go for it. Jack Greenfield | Google On Fri, Feb 19, 2016 at 12:07 PM, Prashanth B notifications@github.com
|
Chao, do you have cycles to take a look? |
I thought Patch is retried in the server. I'll investigate. |
Patch is retried at server-side five times before an error is returned: https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/resthandler.go#L92, so I expect this issue to occur very rarely. In fact, it has passed last ~400 kubernetes-gce-e2e tests. If we want to make this test less flaky, I'd recommend increase the number of the retries at the server-side. What do you guys think? |
It'll fail if an actual conflicting change was made, though. Do we know On Mon, Feb 22, 2016 at 1:49 PM, Chao Xu notifications@github.com wrote:
|
This happened last week: #15493 (comment) |
@lavalamp you are right. I thought an actual conflicting change would result in a different error message but it's not. I'll add the logs. |
I only see 3 reports of failures since October, so I'm going to say, no, it doesn't need to be fixed in 1.2. |
@bgrant0607 any progress on this now? I returned back to the kubectl apply issues, and hope I can fix all apply related issues before 1.3.0 release. |
I'll try to check the flake rate again. IMO if the flake rate is non-negligible, we should increase the retry limits of Patch, or more aggressively, just remove the limit. https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/resthandler.go#L92 |
Infinite retry still scares me, mostly because we may have failure modes On Mon, Mar 28, 2016 at 3:57 AM, Chao Xu notifications@github.com wrote:
|
@adohe I don't think anyone has worked on this. Though the original failing test/logs weren't recorded, it appears to be a failure of "should apply a new configuration to an existing RC": @caesarxuchao There is a retry loop in the Patch handler in apiserver, but if resourceVersion is specified, the retry loop will always fail in the case of a conflict. (We should detect that and bail out, but I don't think we do.) In the case of such a conflict, apply should diff again, create another patch, and try again. I agree with @smarterclayton that it shouldn't retry forever. We should use the same retry limits we use for other commands. |
@smarterclayton we have an infinite loop in GuaranteedUpdate (https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/etcd/etcd_helper.go#L610), maybe we should apply a retry limit there as well? |
Agree. Just note that the test flake doesn't specify resourceVersion. |
@adohe Since this is causing flakey test failures, it is something we should focus on. Is this something I should pick up? |
FYI: @adohe has a PR out for this under review |
Making this a P3 (no release block) since I looks like we haven't seen many instances since Feb 2016 |
Automatic merge from submit-queue kubectl apply retry stale resource version ```release-note kubectl apply: retry applying a patch if a version conflict error is encountered ``` [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]() fixes #15493 @pwittrock I just got my original implementation back, ptal.
From an e2e test failure:
The command should retry resource version mismatches like scale and rollingupdate.
cc @kubernetes/kubectl
The text was updated successfully, but these errors were encountered: