-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Fix long-standing bug in stringSetToPointers #26331
Conversation
Someone with an unpatched controller-manager could try looking at AttachLoadBalancerToSubnets logs in CloudTrail. stringSetToPointers() is only called in exactly two points, the addition or removal of subnets during reconciliation, so if you keep your network layout always the same, you might never come across this. |
Thanks for fixing @therc |
@@ -90,7 +90,7 @@ func (s *AWSCloud) ensureLoadBalancer(namespacedName types.NamespacedName, loadB | |||
request := &elb.AttachLoadBalancerToSubnetsInput{} | |||
request.LoadBalancerName = aws.String(loadBalancerName) | |||
request.Subnets = stringSetToPointers(additions) | |||
glog.V(2).Info("Attaching load balancer to added subnets") | |||
glog.V(2).Infof("Attaching load balancer for %v to added subnets %v", request.LoadBalancerName, request.Subnets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the subnets output will be useless, because it is a []*string (right?). If so, can we just remove both logging statement changes. That should also make this easier to backport (if we just have the actual fix in aws_utils.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in my case I could see that there were six pointers instead of three and that the first three were null. But yeah, backporting is probably a bigger deal than more verbose logging in an obscure case.
Instead of N pointers, we were returning N null pointers, followed by the real thing. It's not clear why we didn't trip on this until now, maybe there is a new server-side check for empty subnetID strings.
Removed the logging, rebased and squashed. |
To make the case for backporting: if the nodes from a whole AZ disappear after e.g. a power loss, this bug can make the outage even worse. |
GCE e2e build/test passed for commit ca8699e. |
@therc Please create a cherry pick pull if you want to backport this into the 1.2 branch. |
@therc - as announced on the kubernetes-dev google group yesterday, we plan to cut 1.2.5 next week. If you'd like this change in that release, please create a cherry pick PR against the 1.2 branch asap. Thanks. |
@roberthbailey it took me almost an hour, but I think I tamed my git remotes and the cherry-pick script into doing what I needed: #27664 |
Sorry for the trouble making the cherry pick. Is your development environment configured as described in https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md? |
…31-upstream-release-1.2 Fix long-standing bug in aws.stringSetToPointers
Thanks a lot for the help with the cherry pick! I think the main issue is that I had the official repo as both the |
I merged #27664 so I'm going to remove the cherrypick candidate label. |
…ck-of-#26331-upstream-release-1.2 Fix long-standing bug in aws.stringSetToPointers
…ck-of-#26331-upstream-release-1.2 Fix long-standing bug in aws.stringSetToPointers
Instead of N pointers, we were returning N null pointers, followed by the real
thing. It's not clear why we didn't trip on this until now, maybe there is a
new server-side check for empty subnetID strings.
Also add better logging around this.
Please remove release-note tag, bump priority to P1 (or P0, since this can cause customer throttling if enough services need to be synced at the same time), and backport to 1.2/1.1. I don't think I can do any of those.
cc @justinsb