kubectl apply should retry stale resource version. #15493

j3ffml · 2015-10-12T20:05:54Z

From an e2e test failure:

Error from server: error when applying patch:
    {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"kind\":\"ReplicationController\",\"apiVersion\":\"v1\",\"metadata\":{\"name\":\"redis-master\",\"creationTimestamp\":null,\"labels\":{\"app\":\"redis\",\"kubectl.kubernetes.io/apply-test\":\"ADDED\",\"role\":\"master\"}},\"spec\":{\"replicas\":1,\"selector\":{\"app\":\"redis\",\"kubectl.kubernetes.io/apply-test\":\"ADDED\",\"role\":\"master\"},\"template\":{\"metadata\":{\"creationTimestamp\":null,\"labels\":{\"app\":\"redis\",\"kubectl.kubernetes.io/apply-test\":\"ADDED\",\"role\":\"master\"}},\"spec\":{\"containers\":[{\"name\":\"redis-master\",\"image\":\"redis\",\"ports\":[{\"name\":\"redis-server\",\"containerPort\":6379}],\"resources\":{}}]}}},\"status\":{\"replicas\":0}}"},"creationTimestamp":null,"labels":{"kubectl.kubernetes.io/apply-test":"ADDED"}},"spec":{"selector":{"kubectl.kubernetes.io/apply-test":"ADDED"},"template":{"metadata":{"labels":{"kubectl.kubernetes.io/apply-test":"ADDED"}}}}}
    to:
    &{0xc208226190 0xc2080cbea0 e2e-tests-kubectl-ztbsd redis-master STDIN 0xc208568ff0 0xc2083aa960 1387}
    for: "STDIN": replicationControllers "redis-master" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

The command should retry resource version mismatches like scale and rollingupdate.

cc @kubernetes/kubectl

The text was updated successfully, but these errors were encountered:

timothysc · 2015-10-13T02:15:31Z

/cc @jayunit100 fyi.

jayunit100 · 2015-10-13T02:45:41Z

From kubectl apply perspective , Isn't the failure intentional (i.e. optimistic mutate) ? or... maybe I'm (probably) missing something. I would think its the user's job to retry... ?

jackgr · 2016-01-27T18:47:45Z

The desired behavior that we settled on previously is to detect version mismatches at the client and error. User can force replacement using --force. See #16543, #16569 and #17237.

bgrant0607 · 2016-01-28T07:15:32Z

@jackgr What is the mismatch here? If it is just resourceVersion, kubectl should retry. It should only bail on a meaningful conflict.

We're working on eliminating flaky tests (and flaky parts of the system).

Also, that log message isn't very useful.

jackgr · 2016-01-28T18:36:04Z

@bgrant0607 per discussion on #16543, we can't determine the correct behavior when the resource version changes without more information, so best to catch version mismatch at the client and bail, and let the user replace the object with --force if they want. We can also make the error message less verbose.

bgrant0607 · 2016-01-28T21:44:27Z

@jackgr I don't see any mention of resourceVersion in #16543. It shouldn't be serialized in the annotation, because the user shouldn't specify it in their config. The resourceVersion-based precondition is intended to guard against concurrent read-modify-write operations. kubectl apply should be able retry the entire diff&patch in the case of this type of conflict.

That said, it's not obvious to me from the log message what the actual problem was in this case.

jackgr · 2016-01-28T22:42:17Z

I'm sorry, my mistake. You're right that resourceVersion is not in the annotation, and that we should retry on resourceVersion failure.

I was confusing this with the api version change issue discussed in #16543. This is actually a much simpler issue, in the sense that it's an optimistic lock conflict.

If we encounter a lock conflict, then we should fetch a fresh copy of the object, and try again to apply the new configuration supplied by the user.

If we don't get a merge conflict when we resubmit, then all's well.
If we get a merge conflict because the object has changed out from under the user, then we report it in those terms, and the user can decide what to do (e.g., replace using --force).

Does this fit your expectation of how it should be handled?

bprashanth · 2016-02-18T02:09:20Z

This is causing test flake:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/20534/kubernetes-pull-build-test-e2e-gce/28891
http://kubekins.dls.corp.google.com:8081/job/kubernetes-pull-build-test-e2e-gce/28891/

    to:
    &{0xc20827c190 0xc2080cbf80 e2e-tests-kubectl-o39wt redis-master STDIN 0xc208504690 0xc208562780 4508}
    for: "STDIN": replicationControllers "redis-master" cannot be updated: the object has been modified; please apply your changes to the latest version and try again


not to have occurred

lavalamp · 2016-02-19T19:50:24Z

@jackgr yes, your understanding is correct. Do you have time to fix this? We're trying to get all our tests fixed. Additionally this sounds like a legit system bug, not just a flaky test problem.

jackgr · 2016-02-19T19:59:14Z

@lavalamp Maybe. Swamped right now, but can fit it in given enough notice. When is the fix needed?

lavalamp · 2016-02-19T20:05:55Z

@jackgr asap. We'll find someone else.

bprashanth · 2016-02-19T20:07:16Z

This should just be a generic GET/Update retry loop, eg: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller_utils.go#L53

jackgr · 2016-02-19T20:12:56Z

Go for it.

Jack Greenfield | Google

On Fri, Feb 19, 2016 at 12:07 PM, Prashanth B notifications@github.com
wrote:

This should just be a generic GET/Update retry loop, eg:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replication/replication_controller_utils.go#L53

—
Reply to this email directly or view it on GitHub
#15493 (comment)
.

lavalamp · 2016-02-22T18:26:53Z

Chao, do you have cycles to take a look?

caesarxuchao · 2016-02-22T18:44:36Z

I thought Patch is retried in the server. I'll investigate.

caesarxuchao · 2016-02-22T21:48:48Z

Patch is retried at server-side five times before an error is returned: https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/resthandler.go#L92, so I expect this issue to occur very rarely. In fact, it has passed last ~400 kubernetes-gce-e2e tests.

If we want to make this test less flaky, I'd recommend increase the number of the retries at the server-side. What do you guys think?

lavalamp · 2016-02-22T22:23:28Z

It'll fail if an actual conflicting change was made, though. Do we know
what the nature of the conflict was? Maybe first we need to add some log
messages to figure out what the conflict was next time it occurs.

On Mon, Feb 22, 2016 at 1:49 PM, Chao Xu notifications@github.com wrote:

Patch is retried at server-side five times before an error is returned:
https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/resthandler.go#L92,
so I expect this issue to occur very rarely. In fact, it has passed last
~400 kubernetes-gce-e2e tests.

If we want to make this test less flaky, I'd recommend increase the number
of the retries at the server-side. What do you guys think?

—
Reply to this email directly or view it on GitHub
#15493 (comment)
.

bprashanth · 2016-02-22T22:34:53Z

so I expect this issue to occur very rarely. In fact, it has passed last ~400 kubernetes-gce-e2e tests.

This happened last week: #15493 (comment)

caesarxuchao · 2016-02-22T22:54:28Z

@lavalamp you are right. I thought an actual conflicting change would result in a different error message but it's not. I'll add the logs.

bgrant0607 · 2016-02-25T18:54:49Z

I only see 3 reports of failures since October, so I'm going to say, no, it doesn't need to be fixed in 1.2.

adohe-zz · 2016-03-28T04:01:14Z

@bgrant0607 any progress on this now? I returned back to the kubectl apply issues, and hope I can fix all apply related issues before 1.3.0 release.

caesarxuchao · 2016-03-28T07:57:15Z

I'll try to check the flake rate again. IMO if the flake rate is non-negligible, we should increase the retry limits of Patch, or more aggressively, just remove the limit. https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/resthandler.go#L92

smarterclayton · 2016-03-28T14:01:14Z

Infinite retry still scares me, mostly because we may have failure modes
that could race for a very long time.

On Mon, Mar 28, 2016 at 3:57 AM, Chao Xu notifications@github.com wrote:

I'll try to check the flake rate again. IMO if the flake rate is
non-negligible, we should increase the retry limits of Patch, or more
aggressively, just remove the limit.
https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/resthandler.go#L92

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub
#15493 (comment)

bgrant0607 · 2016-03-28T16:26:30Z

@adohe I don't think anyone has worked on this.

Though the original failing test/logs weren't recorded, it appears to be a failure of "should apply a new configuration to an existing RC":
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/kubectl.go#L541

@caesarxuchao There is a retry loop in the Patch handler in apiserver, but if resourceVersion is specified, the retry loop will always fail in the case of a conflict. (We should detect that and bail out, but I don't think we do.)

In the case of such a conflict, apply should diff again, create another patch, and try again.

I agree with @smarterclayton that it shouldn't retry forever. We should use the same retry limits we use for other commands.

caesarxuchao · 2016-03-28T17:13:28Z

@smarterclayton we have an infinite loop in GuaranteedUpdate (https://github.com/kubernetes/kubernetes/blob/master/pkg/storage/etcd/etcd_helper.go#L610), maybe we should apply a retry limit there as well?

caesarxuchao · 2016-03-28T17:17:11Z

There is a retry loop in the Patch handler in apiserver, but if resourceVersion is specified, the retry loop will always fail in the case of a conflict. (We should detect that and bail out, but I don't think we do.)
In the case of such a conflict, apply should diff again, create another patch, and try again.

Agree. Just note that the test flake doesn't specify resourceVersion.

pwittrock · 2016-05-26T22:26:42Z

@adohe Since this is causing flakey test failures, it is something we should focus on. Is this something I should pick up?

pwittrock · 2016-06-07T01:24:06Z

FYI: @adohe has a PR out for this under review

pwittrock · 2016-06-08T21:50:09Z

Making this a P3 (no release block) since I looks like we haven't seen many instances since Feb 2016

@pwittrock

Automatic merge from submit-queue kubectl apply retry stale resource version ```release-note kubectl apply: retry applying a patch if a version conflict error is encountered ``` [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]() fixes #15493 @pwittrock I just got my original implementation back, ptal.

j3ffml added area/kubectl team/ux labels Oct 12, 2015

jszczepkowski mentioned this issue Oct 16, 2015

Cherry pick of #15706 #15774

Merged

bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. area/app-lifecycle labels Jan 28, 2016

bgrant0607 added this to the next-candidate milestone Jan 28, 2016

bprashanth added the kind/flake Categorizes issue or PR as related to a flaky test. label Feb 18, 2016

bprashanth mentioned this issue Feb 18, 2016

Automated cherry pick of #15475 #20534

Merged

bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 18, 2016

lavalamp assigned caesarxuchao Feb 22, 2016

bgrant0607 mentioned this issue Mar 13, 2016

kubectl apply doesn't remove container args #22342

Closed

bgrant0607 added the help-wanted label Mar 28, 2016

bgrant0607 assigned adohe-zz Mar 28, 2016

bgrant0607 removed this from the next-candidate milestone Mar 28, 2016

bgrant0607 unassigned adohe-zz May 25, 2016

bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 25, 2016

fejta assigned bgrant0607 May 25, 2016

pwittrock mentioned this issue May 27, 2016

e2e flake: Kubernetes e2e suite.Kubectl client Kubectl apply should apply a new configuration to an existing RC #23389

Closed

bgrant0607 assigned pwittrock and unassigned bgrant0607 May 31, 2016

bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels May 31, 2016

adohe-zz mentioned this issue May 31, 2016

kubectl apply retry stale resource version #26557

Merged

pwittrock assigned adohe-zz Jun 1, 2016

pwittrock added the 3 - Review label Jun 7, 2016

pwittrock added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 8, 2016

k8s-github-robot closed this as completed in #26557 Jun 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl apply should retry stale resource version. #15493

kubectl apply should retry stale resource version. #15493

j3ffml commented Oct 12, 2015

timothysc commented Oct 13, 2015

jayunit100 commented Oct 13, 2015

jackgr commented Jan 27, 2016

bgrant0607 commented Jan 28, 2016

jackgr commented Jan 28, 2016

bgrant0607 commented Jan 28, 2016

jackgr commented Jan 28, 2016

bprashanth commented Feb 18, 2016

lavalamp commented Feb 19, 2016

jackgr commented Feb 19, 2016

lavalamp commented Feb 19, 2016

bprashanth commented Feb 19, 2016

jackgr commented Feb 19, 2016

lavalamp commented Feb 22, 2016

caesarxuchao commented Feb 22, 2016

caesarxuchao commented Feb 22, 2016

lavalamp commented Feb 22, 2016

bprashanth commented Feb 22, 2016

caesarxuchao commented Feb 22, 2016

bgrant0607 commented Feb 25, 2016

adohe-zz commented Mar 28, 2016

caesarxuchao commented Mar 28, 2016

smarterclayton commented Mar 28, 2016

bgrant0607 commented Mar 28, 2016

caesarxuchao commented Mar 28, 2016

caesarxuchao commented Mar 28, 2016

pwittrock commented May 26, 2016

pwittrock commented Jun 7, 2016

pwittrock commented Jun 8, 2016

kubectl apply should retry stale resource version. #15493

kubectl apply should retry stale resource version. #15493

Comments

j3ffml commented Oct 12, 2015

timothysc commented Oct 13, 2015

jayunit100 commented Oct 13, 2015

jackgr commented Jan 27, 2016

bgrant0607 commented Jan 28, 2016

jackgr commented Jan 28, 2016

bgrant0607 commented Jan 28, 2016

jackgr commented Jan 28, 2016

bprashanth commented Feb 18, 2016

lavalamp commented Feb 19, 2016

jackgr commented Feb 19, 2016

lavalamp commented Feb 19, 2016

bprashanth commented Feb 19, 2016

jackgr commented Feb 19, 2016

lavalamp commented Feb 22, 2016

caesarxuchao commented Feb 22, 2016

caesarxuchao commented Feb 22, 2016

lavalamp commented Feb 22, 2016

bprashanth commented Feb 22, 2016

caesarxuchao commented Feb 22, 2016

bgrant0607 commented Feb 25, 2016

adohe-zz commented Mar 28, 2016

caesarxuchao commented Mar 28, 2016

smarterclayton commented Mar 28, 2016

bgrant0607 commented Mar 28, 2016

caesarxuchao commented Mar 28, 2016

caesarxuchao commented Mar 28, 2016

pwittrock commented May 26, 2016

pwittrock commented Jun 7, 2016

pwittrock commented Jun 8, 2016