-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCE: Add Alpha feature "Network Tiers" for external L4 load balancers #51301
Conversation
@nicksardo @bowei I split the changes into multiple commits so that it'd be easier to review. If you prefer multiple PRs, let me know. |
adf763c
to
5b129fd
Compare
if _, exists := cas.addrsByRegionAndName[region][name]; !exists { | ||
return makeGoogleAPINotFoundError("") | ||
} | ||
delete(cas.addrsByRegionAndName[region], name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete(cas.reservedAddrs, addr.Address)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the new commit.
if isNotFound(err) { | ||
return nil | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: else if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// reconcileFwdRuleNetworkTiers check the network tiers of existing forwarding | ||
// rule and delete the rule if the tier does not matched the desired tier. | ||
func reconcileFwdRuleNetworkTiers(s CloudForwardingRuleService, region, name, lbRef string, desiredNetTier NetworkTier) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the word reconcile
seems strange to me here because you're only deleting the forwarding rule/address if the tier needs changed.
I suggest wripping the two reconcile
funcs apart and introduce a higher level func which checks tier equality and deletes the resources if necessary.
func deleteWrongNetworkTieredResources(...){
fwdRuleOk, err := fwdRuleNetworkTierEquals(gce, region, name, lbRef, desiredNetTier)
if err!= nil {
...
}
if !fwdRuleOk {
err = s.DeleteRegionForwardingRule(name, region)
}
addrOk, err := addressNetworkTierEquals(gce, region, name, lbRef, desiredNetTier)
if err!= nil {
...
}
if !addrOk {
err = s.DeleteRegionAddress(name, region)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the names are confusing. The function was initially intended to do more, but that didn't pan out.
Renamed the functions. I still keep the address and forwarding rule verification in two functions to make testing easier.
5b129fd
to
2afc5e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments. Thanks!
if _, exists := cas.addrsByRegionAndName[region][name]; !exists { | ||
return makeGoogleAPINotFoundError("") | ||
} | ||
delete(cas.addrsByRegionAndName[region], name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the new commit.
if isNotFound(err) { | ||
return nil | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// reconcileFwdRuleNetworkTiers check the network tiers of existing forwarding | ||
// rule and delete the rule if the tier does not matched the desired tier. | ||
func reconcileFwdRuleNetworkTiers(s CloudForwardingRuleService, region, name, lbRef string, desiredNetTier NetworkTier) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the names are confusing. The function was initially intended to do more, but that didn't pan out.
Renamed the functions. I still keep the address and forwarding rule verification in two functions to make testing easier.
addressObj.Address = existingIP | ||
} | ||
if err = s.ReserveRegionAddress(addressObj, region); err != nil { | ||
if !isHTTPErrorCode(err, http.StatusConflict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or BadRequest
addressObj.Address = existingIP | ||
} | ||
if err = s.ReserveAlphaRegionAddress(addressObj, region); err != nil { | ||
if !isHTTPErrorCode(err, http.StatusConflict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or BadRequest
Flakes caused by #51379 /retest |
/test pull-kubernetes-e2e-gce-bazel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and logic makes sense to me.
Couple minor comments.
// NetworkTierAnnotationKey is annotated on a Service object to indicate which | ||
// network tier a GCP LB should use. The valid values are "Standard" and | ||
// "Premium" (default). | ||
NetworkTierAnnotationKey = "cloud.google.com/network-tier" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify alpha
in the key? I forgot the reason... @nicksardo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find it but I think it's recommended not to use alpha
in the annotation anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I suggest double checking with thockin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @thockin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke to @thockin offline. This is ok.
glog.Infof("EnsureLoadBalancer(%v(%v)): creating forwarding rule, IP %s", loadBalancerName, serviceName, ipAddressToUse) | ||
if err := gce.createForwardingRule(loadBalancerName, serviceName.String(), gce.region, ipAddressToUse, ports); err != nil { | ||
glog.Infof("EnsureLoadBalancer(%v(%v)): creating forwarding rule, IP %s (tier: %s)", loadBalancerName, serviceName, ipAddressToUse, netTier) | ||
if err := createForwardingRule(gce, loadBalancerName, serviceName.String(), gce.region, ipAddressToUse, gce.targetPoolURL(loadBalancerName), ports, netTier); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still call targetPoolURL() in createForwardingRule()? Seems a bit unclean in this way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, I see why it is hard, targetPoolURL desires some info from the gce struct but there is no obvious way for CloudForwardingRuleService
to expose those. And we need CloudForwardingRuleService
for mocking and testing.
Never mind :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I could fix the targetPool code too with more mocking, but I'd prefer leaving them alone for this change.
@@ -84,6 +84,11 @@ func makeBackendServiceDescription(nm types.NamespacedName, shared bool) string | |||
|
|||
// External Load Balancer | |||
|
|||
// makeServiceDescription is used to generate descriptions for forwarding rules and addresses. | |||
func makeServiceDescription(serviceName string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good to name this function in a more consistent way among this file...We already have makeBackendServiceDescription
above.
But of course I couldn't think of a better name ATM :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we punt this to when we have more than two functions? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we punt this to when we have more than two functions? :-)
SGTM
addressObj.Address = existingIP | ||
} | ||
if err = s.ReserveAlphaRegionAddress(addressObj, region); err != nil { | ||
if !isHTTPErrorCode(err, http.StatusConflict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be cleaner to check error after the switch? Seems like they are the exact same codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like checking the err code close to the call site, but sure I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -85,3 +122,54 @@ func TestVerifyRequestedIP(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
func TestCreateForwardingRule(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestCreateForwardingRuleWithTier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
netTier := NetworkTierGCEValueToType(netTierStr) | ||
if netTier != desiredNetTier { | ||
glog.Errorf("verifyUserRequestedIP: requested static IP %q (name: %s) for LB %s has network tier %s, need %s.", requestedIP, existingAddress.Name, lbRef, netTier, desiredNetTier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: s/need/needs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or does it makes sense to log and return the same error content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error will be included in the k8s event and probably should not have too many function-specific information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error will be included in the k8s event and probably should not have too many function-specific information.
Good point :)
// Check the current and the desired network tiers. If they do not match, | ||
// tear down the existing resources -- delete the forwarding rule, and | ||
// release the IP. | ||
netTier, err := gce.getServiceNetworkTier(apiService) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also gate network tier outside of getServiceNetworkTier()
? Like reconcileXXXNetworkTiers()
below, so that we don't trigger the logic at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that intentionally since getServiceNetworkTier()
should always return the default tier if the feature is disabled. If I could, I'd actually prefer gating it only at the gce API calls so that we can reuse the same flow when the feature graduates to Beta. Otherwise, the beta code will be completely new code which needs to be tested all over again. I think the current implementation provides sufficient "gating" but feel free to disagree :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm...I think I some how mixed the concepts of gating alpha/beta GCE API and gating the actual feature. IMHO:
- Gate alpha/beta API means no calls to them.
- Gate the feature means no triggers to the logic.
Given that the GCE provider gate is about gating API but not feature, I think it makes sense to do it this way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation provides sufficient "gating" but feel free to disagree :-)
But yeah technically this is sufficient :)
glog.Errorf("EnsureLoadBalancer(%s): failed to get the desired network tier: %v", lbRefStr, err) | ||
return nil, err | ||
} | ||
glog.V(2).Infof("EnsureLoadBalancer(%s): desired network tier %q ", lbRefStr, netTier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User that doesn't enable network tier may be seeing this "network tier" log and they may think it is mistakenly enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bumped it to V(4) so that it doesn't show up in GKE clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for switching side...Actually I'd prefer to keep this in V(2) for better debuggability in case things go wrong in production. (I'd like to feature gate it, or maybe it just doesn't matter)
@@ -991,15 +987,28 @@ func (gce *GCECloud) getServiceNetworkTier(svc *v1.Service) (NetworkTier, error) | |||
return tier, nil | |||
} | |||
|
|||
// reconcileFwdRuleNetworkTiers check the network tiers of existing forwarding | |||
func (gce *GCECloud) deleteWrongNetworkTieredResources(lbName, lbRef string, desiredNetTier NetworkTier) error { | |||
if !gce.AlphaFeatureGate.Enabled(AlphaFeatureNetworkTiers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm..Seems like the feature gate check is moved inside functions. I think having it outiside may make it clear about what are involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored it.
@MrHohn the new commit addressed (most) of your comments. |
Since you're making the BadRequest/StatusConflict fix in this PR, could you also change the fake service to return the both types. func (cas *FakeCloudAddressService) ReserveAlphaRegionAddress(addr *computealpha.Address, region string) error {
if addr.Address == "" {
addr.Address = fmt.Sprintf("1.2.3.%d", cas.count)
cas.count++
}
if addr.AddressType == "" {
addr.AddressType = string(schemeExternal)
}
if cas.reservedAddrs[addr.Address] {
msg := "IP in use"
// Depending on the type of address, different error messages may be returned.
switch lbScheme(addr.AddressType) {
case schemeExternal:
return makeGoogleAPIError(http.StatusBadRequest, msg)
default:
return makeGoogleAPIError(http.StatusConflict, msg)
}
}
if _, exists := cas.addrsByRegionAndName[region]; !exists {
cas.addrsByRegionAndName[region] = make(map[string]*computealpha.Address)
}
if _, exists := cas.addrsByRegionAndName[region][addr.Name]; exists {
return makeGoogleAPIError(http.StatusConflict, "name in use")
}
cas.addrsByRegionAndName[region][addr.Name] = addr
cas.reservedAddrs[addr.Address] = true
return nil
} |
@@ -132,3 +132,15 @@ func (gce *GCECloud) GetRegionAddressByIP(region, ipAddress string) (*compute.Ad | |||
} | |||
return nil, makeGoogleAPINotFoundError(fmt.Sprintf("Address with IP %q was not found in region %q", ipAddress, region)) | |||
} | |||
|
|||
// TODO(yujuhong): retire this function once Network Tiers becomes Beta in GCP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we open an issue and reference that (rather than by username). I know it's annoying, but it's better to not assign to a person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
NetworkTierAnnotationStandard = "Standard" | ||
NetworkTierAnnotationPremium = "Premium" | ||
|
||
NetworkTierStandard NetworkTier = NetworkTierAnnotationStandard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why can't do directly?
NetworkTierStandard NetworkTier = "Standard"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping the annotation key/values pure strings (not an internal NetworkTier
type) since it's essentially our API.
// GetServiceNetworkTier returns the network tier of GCP load balancer | ||
// which should be assembled, and an error if the specified tier is not | ||
// supported. | ||
func GetServiceNetworkTier(service *v1.Service) (NetworkTier, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golang usually don't add Get prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestion? I am bad with naming and the Get
prefix makes the intent of the function clear to me.
@@ -133,3 +133,15 @@ func (gce *GCECloud) DeleteRegionForwardingRule(name, region string) error { | |||
|
|||
return gce.waitForRegionOp(op, region, mc) | |||
} | |||
|
|||
// TODO(yujuhong): retire this function once Network Tiers becomes Beta in GCP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to the same issue as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -426,7 +439,7 @@ func (gce *GCECloud) DeleteExternalTargetPoolAndChecks(name, region, clusterID s | |||
// verifyUserRequestedIP checks the user-provided IP to see whether it can be | |||
// used for the LB. It also returns whether the IP is considered owned by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify this comment as the function can now return (true, err != nil), seems confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I changed the function to never return (true, non-nil-err). If the err is non-nil, the boolean value should not be used anyway.
if existingTier == desiredNetTier { | ||
return nil | ||
} | ||
glog.V(2).Infof("%s: Network tiers do not match; existing forwarding rule: %q, desired: %q", logPrefix, existingTier, desiredNetTier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably just combine into one log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if existingTier == desiredNetTier { | ||
return nil | ||
} | ||
glog.V(2).Infof("%s: Network tiers do not match; existing address: %q, desired: %q", logPrefix, existingTier, desiredNetTier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combine these log lines...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return isHTTPErrorCode(err, http.StatusForbidden) | ||
} | ||
|
||
func handleNetworkTierGetError(err error) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this logic not possible to just put in the GetAlphaRegionAddress, ... functions?
this is a super generic function name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's shared by two files -- gce_addresses.go and gce_forwardingrule.go.
I renamed it to handleAlphaNetworkTierGetError()
to make it slightly more explicit, and also added a TODO with a bug number to remove this.
bd13596
to
1a59ea2
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bowei, yujuhong Assign the PR to them by writing Associated issue: 51483 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
1a59ea2
to
447b78c
Compare
447b78c
to
c219d88
Compare
Rebased and addressed the nits. Re-applying lgtm based on #51301 (comment) |
This feature supports specifying what network tier (premium, standard) to use for the load balancer backing the Service (type=LoadBalancer).
c219d88
to
76945ad
Compare
Rebased again and verified that the rebased PR still passed the e2e test. |
Re-applied the lgtm. |
/test pull-kubernetes-kubemark-e2e-gce kubemark issue has already been fixed by a revert. Re-testing |
/test pull-kubernetes-kubemark-e2e-gce |
FWIW, the PR has already passed the kubemark test: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/51301/pull-kubernetes-kubemark-e2e-gce/49968/ |
@spxtr ^^ is this mean something wrong when reporting back (crier)? any idea how to recover? |
/test pull-kubernetes-kubemark-e2e-gce |
http://prow.k8s.io/?type=presubmit&job=pull-kubernetes-kubemark-e2e-gce I think the job is just backlogged... the test was from a different trigger earlier and you triggered it again and now it's queueing again |
Looks like this job is regularly taking 3+ hours to complete, including queue time. Welp. |
Thanks for checking. I'll check the queue next time before triggering.... |
Failed due to some infra flake: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/51301/pull-kubernetes-kubemark-e2e-gce/50032 /test pull-kubernetes-kubemark-e2e-gce |
/retest Review the full test history for this PR. |
/test pull-kubernetes-kubemark-e2e-gce |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
@yujuhong: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Special notes for your reviewer:
The PR has been manually tested in a GCE e2e cluster for the following conditions:
network-tier
is not enabled in gce.conf, network tier annotations are completely ignored by the controller.network-tier
is enabled in gce.conf:ensureExternalLoadBalancer()
returns an error, and controller emits an event.I uploaded an e2e test #51483. You're welcome to review that one too.
Release note: