-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specific error message on failed rolling update issued by older kubectl against 1.4 master #32751
Specific error message on failed rolling update issued by older kubectl against 1.4 master #32751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to make these changes but not if it means we don't make the next cherry-pick.
if e.QualifiedResource.Resource != "replicationcontrollers" { | ||
return nil, err | ||
} | ||
*msg = fmt.Sprintf("Note: if you are using \"kubectl rolling-update\" and your kubectl version is older than v1.4.0, your rolling-update is probably failed. Your pods are successfully updated, but the name of the RC is its original name plus a hash value. If you want to rename the RC, please use a kubectl newer than v1.4.0, then run rolling-update to change the name. : %s", *msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to a document instead of explaining the entire problem right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really really nasty. Is this the best we can do? I'd like a mini-post-mortem on how we got this far with an incompatible change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 'mini-postmortem' would contain mostly the statement that the problem appeared because no one from people working on this feature knew anything about kubectl, and it wasn't caught by our upgrade tests because we didn't have any that were passing ever...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knew it from the beginning...I stated the backward incompatibility in #29971. I was under the impression kubectl rolling-update
was being deprecated by deployment, and we were not looking at the upgrade tests, so it went unnoticed.
if !strings.Contains(userAgent, "kubectl") { | ||
return nil, err | ||
} | ||
// example userAgent string: kubectl-1.3/v1.3.8 (linux/amd64) kubernetes/e328d5b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this parsing into its own function.
if versionErr != nil { | ||
return nil, err | ||
} | ||
if kubectlVersion.GTE(version.MustParse("v1.4.0")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please TODO this to remove after 1.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have to be compatible for 3 versions? Is this documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo added.
if e.QualifiedResource.Resource != "replicationcontrollers" { | ||
return nil, err | ||
} | ||
*msg = fmt.Sprintf("Note: if you are using \"kubectl rolling-update\" and your kubectl version is older than v1.4.0, your rolling-update is probably failed. Your pods are successfully updated, but the name of the RC is its original name plus a hash value. If you want to rename the RC, please use a kubectl newer than v1.4.0, then run rolling-update to change the name. : %s", *msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really really nasty. Is this the best we can do? I'd like a mini-post-mortem on how we got this far with an incompatible change...
d680695
to
9d1bb90
Compare
@lavalamp comments addressed. |
9d1bb90
to
0534a3a
Compare
I was just wild-guessing. I mean, really, we should NEVER be incompatible On Wed, Sep 14, 2016 at 10:10 PM, Chao Xu notifications@github.com wrote:
|
0534a3a
to
9b30f67
Compare
@@ -188,6 +189,9 @@ binary | sha256 hash | |||
|
|||
**No notable changes for this release** | |||
|
|||
## Known issues | |||
|
|||
* master newer than v1.4.0 has backward compatible issues with older kubectl. Specifically, if "kubectl rolling-update" does not change the name of the replication controller (RC), the command will fail to create the new RC and report the RC already exists. When this happens, the pods are successfully updated, but the name of the RC is its original name plus a hash value. If you want to rename the RC, please download a kubectl newer than v1.4.0, then run rolling-update to change the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: I'll paste this to #32332 when the PR is approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to help out with the wording here-- I've started something better below. Also, please start a section specifically about the behavior changes the garbage collector introduces and put this there.
`kubectl` 1.3's rolling-update command is compatible with Kubernetes 1.4 and higher **only if** you specify a new replication controller name. You will need to update to kubectl 1.4 or higher to use the rolling update command against a 1.4 cluster if you want to keep the original name, or you'll have to do two rolling updates.
If you do happen to use kubectl 1.3's rolling update against a 1.4 cluster, it will fail, usually with an error message that will direct you here. If you saw that error, then don't worry, the operation succeeded except for the part where the new replication controller is renamed back to the old name. You can just do another rolling update to change the name back: look for a replication controller that has the original name plus a random suffix.
Unfortunately, there is a second possible failure mode: <describe failure mode>
If this happens to you, then <describe fix>
9b30f67
to
23f401b
Compare
@lavalamp could you take another look? Thanks. |
if e.QualifiedResource.Resource != "replicationcontrollers" { | ||
return nil, err | ||
} | ||
*msg = fmt.Sprintf("Note: if you are using \"kubectl rolling-update\" and your kubectl version is older than v1.4.0, your rolling-update is probably failed, though the pods are correctly updated. Please see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#known-backward-compatibility-issues for a workaround. : %s", *msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/rolling-update is probably/rolling-update has probably/
The link is not specific enough, it needs to link not to the rolling changelog but to a specific heading within it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -188,6 +189,9 @@ binary | sha256 hash | |||
|
|||
**No notable changes for this release** | |||
|
|||
## Known issues | |||
|
|||
* master newer than v1.4.0 has backward compatible issues with older kubectl. Specifically, if "kubectl rolling-update" does not change the name of the replication controller (RC), the command will fail to create the new RC and report the RC already exists. When this happens, the pods are successfully updated, but the name of the RC is its original name plus a hash value. If you want to rename the RC, please download a kubectl newer than v1.4.0, then run rolling-update to change the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to help out with the wording here-- I've started something better below. Also, please start a section specifically about the behavior changes the garbage collector introduces and put this there.
`kubectl` 1.3's rolling-update command is compatible with Kubernetes 1.4 and higher **only if** you specify a new replication controller name. You will need to update to kubectl 1.4 or higher to use the rolling update command against a 1.4 cluster if you want to keep the original name, or you'll have to do two rolling updates.
If you do happen to use kubectl 1.3's rolling update against a 1.4 cluster, it will fail, usually with an error message that will direct you here. If you saw that error, then don't worry, the operation succeeded except for the part where the new replication controller is renamed back to the old name. You can just do another rolling update to change the name back: look for a replication controller that has the original name plus a random suffix.
Unfortunately, there is a second possible failure mode: <describe failure mode>
If this happens to you, then <describe fix>
@@ -216,6 +217,9 @@ binary | sha256 hash | |||
|
|||
**No notable changes for this release** | |||
|
|||
## Known issues | |||
|
|||
* master newer than v1.4.0 has backward compatible issues with older kubectl. Specifically, if "kubectl rolling-update" does not change the name of the replication controller (RC), the command will fail to create the new RC and report the RC already exists. When this happens, the pods are successfully updated, but the name of the RC is its original name plus a hash value. If you want to rename the RC, please download a kubectl newer than v1.4.0, then run rolling-update to change the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making a section or bullet point here to let people know it's specific to kubectl rolling-update
so they can skip if they're not using it. Something like:
kubectl rolling-update
blah blah
or
kubectl rolling-update
: blah blah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e992784
to
380a32e
Compare
380a32e
to
c4ea205
Compare
@lavalamp PTAL. Thanks. |
|
||
### kubectl delete | ||
|
||
If you use an old version kubectl to delete a replication controller or replicaset, then after the delete command has returned, the replication controller or the replicaset will continue to exist in the key-value store for a short period of time (<1s). You probably will not notice any difference if you use kubectl manually, but you might notice it if you are using kubectl in a script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should people do to fix their scripts?
OK, the contents look pretty good now, but we need to fix the grammar :) Maybe we should merge and fix in a followup. |
@pwittrock @eparis this doesn't fix kubectl 1.3 but it does make the failure mode much better. Please cherrypick. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit c4ea205. |
Automatic merge from submit-queue |
if !isOldKubectl(userAgent) { | ||
return nil, err | ||
} | ||
if e.QualifiedResource.Resource != "replicationcontrollers" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't it happen for replica sets as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this new review thingy is strange - I see @lavalamp's reply at the bottom.
I don't know - I thought that replica set is just a copy of replication controller + new selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl rolling-update
only works with rc, so we don't worry about rs.
if e.QualifiedResource.Resource != "replicationcontrollers" { | ||
return nil, err | ||
} | ||
*msg = fmt.Sprintf("Note: if you are using \"kubectl rolling-update\" and your kubectl version is older than v1.4.0, your rolling-update has probably failed, though the pods are correctly updated. Please see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#kubectl-rolling-update for a workaround. : %s", *msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check if the error is actually a name collision, not e.g. internal server error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's checked a few lines before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !kubeerr.IsAlreadyExists(err) {
return nil, err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, OK - I was looking only at your changes.
There's also the step where Chao fixed kubectl and I must have been asleep On Fri, Sep 16, 2016 at 4:48 AM, Marek Grabowski notifications@github.com
|
I was under the impression that rolling-update doesn't work on RS, if On Fri, Sep 16, 2016 at 7:06 AM, Marek Grabowski notifications@github.com
|
Yeah, good point. @ceaserxuchao can you make a followup? Thanks! On Fri, Sep 16, 2016 at 7:06 AM, Marek Grabowski notifications@github.com
|
…rolling-update Automatic merge from submit-queue Specific error message on failed rolling update issued by older kubectl against 1.4 master Fix kubernetes#32706 `kubernetes-e2e-gke-1.4-1.3-kubectl-skew` (1.3 kubectl and 1.4 master) test suite failed with: ``` k8s.io] Kubectl client [k8s.io] Kubectl rolling-update should support rolling-update to same image [Conformance] ... Error from server: object is being deleted: replicationcontrollers "e2e-test-nginx-rc" already exists error: exit status 1 not to have occurred ``` It's because the old RC had an orphanFinalizer, so it is not deleted from the key-value store immediately. In turn, the creation of the new RC of the same name failed. In this failure, the RC and pods are updated, it's just that the RC is of different name, i.e., original name + a hash generated based on podTemplate. The error is confusing to user, but not that bad. So this PR just prints a warning message to instruct users how to work around. 1.4 kubectl rolling-update uses different logic so it's working. @lavalamp @gmarek @janetkuo @pwittrock cc @liggitt for the ctx changes. (cherry picked from commit a39cee9) [remove the CHANGELOG changes to make pick possible]
Commit found in the "release-1.4" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
…-collector-error Automatic merge from submit-queue Improve error message when kubectl rolling-update fail due to version skew <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md 2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md 3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes --> **What this PR does / why we need it**: follow up #32751, we should print the real error message first, the workaround later **Before:** ```console $ kubectl rolling-update nginx --image=redis Created nginx-6ee4372891ec51a97dfbf83ed0846886 Scaling up nginx-6ee4372891ec51a97dfbf83ed0846886 from 0 to 1, scaling down nginx from 1 to 0 (keep 1 pods available, don't exceed 2 pods) Scaling nginx-6ee4372891ec51a97dfbf83ed0846886 up to 1 Scaling nginx down to 0 Update succeeded. Deleting old controller: nginx Renaming nginx-6ee4372891ec51a97dfbf83ed0846886 to nginx Error from server: Note: if you are using "kubectl rolling-update" and your kubectl version is older than v1.4.0, your rolling-update has probably failed, though the pods are correctly updated. Please see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#kubectl-rolling-update for a workaround. : object is being deleted: replicationcontrollers "nginx" already exists ``` **After:** (see the error message) ```console $ kubectl rolling-update nginx --image=redis Created nginx-12b5782bcdff627fca46537e9e1045f8 Scaling up nginx-12b5782bcdff627fca46537e9e1045f8 from 0 to 1, scaling down nginx from 1 to 0 (keep 1 pods available, don't exceed 2 pods) Scaling nginx-12b5782bcdff627fca46537e9e1045f8 up to 1 Scaling nginx down to 0 Update succeeded. Deleting old controller: nginx Renaming nginx-12b5782bcdff627fca46537e9e1045f8 to nginx Error from server: object is being deleted: replicationcontrollers "nginx" already exists: if you're using "kubectl rolling-update" with kubectl version older than v1.4.0, your rolling update has failed, though the pods are correctly updated. Please see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#kubectl-rolling-update for a workaround ``` **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: **Release note**: <!-- Steps to write your release note: 1. Use the release-note-* labels to set the release note state (if you have access) 2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. --> ```release-note NONE ``` Print the real error message first, the workaround later @lavalamp @gmarek
…rolling-update Automatic merge from submit-queue Specific error message on failed rolling update issued by older kubectl against 1.4 master Fix kubernetes#32706 `kubernetes-e2e-gke-1.4-1.3-kubectl-skew` (1.3 kubectl and 1.4 master) test suite failed with: ``` k8s.io] Kubectl client [k8s.io] Kubectl rolling-update should support rolling-update to same image [Conformance] ... Error from server: object is being deleted: replicationcontrollers "e2e-test-nginx-rc" already exists error: exit status 1 not to have occurred ``` It's because the old RC had an orphanFinalizer, so it is not deleted from the key-value store immediately. In turn, the creation of the new RC of the same name failed. In this failure, the RC and pods are updated, it's just that the RC is of different name, i.e., original name + a hash generated based on podTemplate. The error is confusing to user, but not that bad. So this PR just prints a warning message to instruct users how to work around. 1.4 kubectl rolling-update uses different logic so it's working. @lavalamp @gmarek @janetkuo @pwittrock cc @liggitt for the ctx changes. (cherry picked from commit a39cee9) [remove the CHANGELOG changes to make pick possible]
Fix #32706
kubernetes-e2e-gke-1.4-1.3-kubectl-skew
(1.3 kubectl and 1.4 master) test suite failed with:It's because the old RC had an orphanFinalizer, so it is not deleted from the key-value store immediately. In turn, the creation of the new RC of the same name failed.
In this failure, the RC and pods are updated, it's just that the RC is of different name, i.e., original name + a hash generated based on podTemplate. The error is confusing to user, but not that bad. So this PR just prints a warning message to instruct users how to work around.
1.4 kubectl rolling-update uses different logic so it's working.
@lavalamp @gmarek @janetkuo @pwittrock
cc @liggitt for the ctx changes.
This change is