-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCE: Reserve address for ILBs during sync #51055
Conversation
FYI @yujuhong. This PR isn't done, but showing you what the ILB address logic will look like. |
type addressManager struct { | ||
logPrefix string | ||
svc CloudAddressService | ||
lbName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make this name
or addrName
. The fact that the controller chooses to use the lbNamd
for all related resources seemed irrelevant to this manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make it clear that this manager would only manage
(promote, delete) the address with this name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
// If the retrieved address is not named with the loadbalancer name, then the controller does not own it. | ||
if addr.Name != am.lbName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably make this clear by adding a method. It'd be easier to change in the future too.
func (am *addressManager) isManagedAddress(addr *computebeta.Address) {
return addr.Name == am.Name
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
glog.V(3).Infof("%v: not attempting release of address %v.", am.logPrefix, am.targetIP) | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are willing to spend an extra GET call, we can check whether the address is IN_USE. If yes, it's safe release it. If not, this IP may be lost forever. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we don't have to keep the tryRelease
field. Just GET
and return if not found.
err := am.svc.DeleteRegionAddress(am.lbName, am.region) | ||
if err != nil { | ||
if isNotFound(err) { | ||
glog.Warningf("%v: address %q was not found. Ignoring.", am.logPrefix, am.targetIP, am.lbName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove am.targetIP
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks
Subnetwork: am.subnetURL, | ||
} | ||
|
||
err := am.svc.ReserveBetaRegionAddress(newAddr, am.region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (optional). I like less indents.
if err == nil {
return nil
}
if !isHTTPErrorCode(err, http.StatusConflict) {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
computebeta "google.golang.org/api/compute/v0.beta" | ||
compute "google.golang.org/api/compute/v1" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider using github.com/stretchr/testify/assert
(for Errorf
) and github.com/stretchr/testify/require
(for Fatalf
). They tend to make the code more succinct .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of testify/assert for personal projects, but haven't seen enormous use of it here in K8s. Happy to use it unless someone says otherwise.
GetRegionAddressByIP(region, ipAddress string) (*compute.Address, error) | ||
// TODO: Mock `DeleteRegionAddress(name, region string) endpoint | ||
// TODO: Mock Global endpoints | ||
DeleteRegionAddress(name, region string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops... I added the same function in another PR. More rebasing...
targetIP := determineTargetIP(svc, existingFwdRule) | ||
|
||
var addrMgr *addressManager | ||
if targetIP != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. The external LB ensures a static IP (i.e., creating a new one) even if targetIP == ""
because many other components (e.g., firewall, targetpool) needs a target IP. I guess this not the case for ILB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The firewall and target pool only use the IP for providing some extra info in the description. On first sync the description IP would be empty; but on second sync, it would be filled. I didn't see a need to perform the extra 2 API calls when the forwarding rule will just generate an IP itself, but I'll change the logic to have the description set right from the first sync.
|
||
// Reserving the address failed due to a conflict. The address manager just checked that no address | ||
// exists with the lbName, so it may belong to the user. | ||
addr, err := am.svc.GetBetaRegionAddressByIP(am.region, am.targetIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the controller change but didn't see where you verify the AddressType of user-owned address.
// The addrMgr does not exist if no targetIP was specified. This function was not deferred | ||
// because the code should continue to hold the targetIP if the sync fails for any reason. | ||
// At this point, the existing rule is known to exist; therefore, releasing the address is fine. | ||
if addrMgr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not read the entire ILB file, just want to ask whether you need to defer
the ReleaseAddress
to reduce the time the controller holds the static IP. Perhaps the IP quota is not a concern for internal IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, we discussed this case offline. If the service sync fails for any reason, I thought you said we should hold onto the IP for the next sync, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should hold on to the IP if the forwarding rule is not using it. If the forwarding rule IS using the IP, it's safe to release the it since we can always get it back. This is what the external LB do today for optimization. If there is no need for ILB to do so, we don't have to add more complexity.
|
||
err := am.svc.ReserveBetaRegionAddress(newAddr, am.region) | ||
if err != nil { | ||
if !isHTTPErrorCode(err, http.StatusConflict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is not correct. I just found out that if you try to reserve an address with an IP that's already static (but not using the same name), you get a 400 error.
Error 400: Invalid value for field 'resource.address': '1.2.3.4'. Specified IP address is already reserved., invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't think the current external LB controller handle some corner cases because of this....
E.g., if the fwd rule is using an IP that's not specified in the Service object, but is actually owned by the user (e.g., not using the lbName), the controller would just fail to ensure the IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, I just tested this again and got 409. Wonder if there's an alpha vs beta difference
{
"kind": "compute#operation",
"id": "1599788789319664192",
"name": "operation-1503706287098-5579ce716c890-3262ae10-677fa2b7",
"operationType": "insert",
"targetLink": "https://www.googleapis.com/compute/beta/projects/XYZ/regions/us-central1/addresses/myip2",
"targetId": "5223319394054898240",
"status": "DONE",
"progress": 100,
"insertTime": "2017-08-25T17:11:27.558-07:00",
"startTime": "2017-08-25T17:11:28.371-07:00",
"endTime": "2017-08-25T17:11:31.465-07:00",
"error": {
"errors": [
{
"code": "IP_IN_USE_BY_ANOTHER_RESOURCE",
"message": "IP '10.0.0.10' is already being used by another resource. "
}
]
},
"httpErrorStatusCode": 409,
"httpErrorMessage": "CONFLICT",
"selfLink": "https://www.googleapis.com/compute/beta/projects/XYZ/regions/us-central1/operations/operation-1503706287098-5579ce716c890-3262ae10-677fa2b7",
"region": "https://www.googleapis.com/compute/beta/projects/XYZ/regions/us-central1"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested more and it's not about alpha/beta/v1. The function returns 409 if AddressType == INTERNAL. Otherwise, it always returns 400. (sigh...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should take this up with the appropriate team - it should be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall with mainly some nits.
FWIW, I tried with my GCP project and reserved internal address doesn't seem to take up any quota of static IP addresses.
} | ||
|
||
// HoldAddress will ensure that the IP is reserved with an address - either owned by the controller | ||
// or by a user. If the address is not the loadBalancerName, then it's assumed to be a user's address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/loadBalancerName/addressManager.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: document that the returned string is the IP address reserved.
return addr.Address, nil | ||
} | ||
|
||
glog.V(3).Infof("%v: existing address %q has IP %q Type %q which does not match targeted IP %q Type %q. Attemping to delete.", am.logPrefix, addr.Name, addr.Address, addr.AddressType, am.targetIP, am.addressType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this V(2). Would be useful for debugging GKE issues.
// could be reserving another address; therefore, it would need to be deleted. In the normal | ||
// case of using a controller address, retrieving the address by name results in the fewest API | ||
// calls since it indicates whether a Delete is necessary before Reserve. | ||
glog.V(3).Infof("%v: attempting hold of IP %q with type %q", am.logPrefix, am.targetIP, am.addressType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why using V(3) for the majority of the logs? Just curious because I've seen mostly V(2) and V(4) logs in this package
} | ||
|
||
// Check that the address attributes are as required. | ||
if addr.AddressType != string(am.addressType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: factor this out to a function since the manager checks essentially the same thing in line 71. This ensures that they stay consistent. E.g.,
func (am *addressManager) validateAddress(addr *computebeta.Address) error {
...
}
loadBalancerName := "a111111111111111" | ||
forwardingRuleAddr := "1.1.1.1" | ||
loadBalancerIP := "" | ||
targetIP := getTargetIP(forwardingRuleAddr, loadBalancerIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using this function? AddressManager doesn't care what targetIP you choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (optional): the only difference between this test and the previous test is that the targetIP is non-empty. Could consider merging the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is moving the code after HoldAddress
to a separate function to reuse across tests.
assert.NotEmpty(t, ipToUse) | ||
|
||
addr, err = svc.GetBetaRegionAddress(loadBalancerName, region) | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.NoError(t, err)
assert.NoError(t, err) | ||
assert.NotEmpty(t, ipToUse) | ||
|
||
addr, err = svc.GetBetaRegionAddress(loadBalancerName, region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use GetRegionAddress
in other tests, why using Beta here?
return &addr | ||
} | ||
|
||
func (cas *FakeCloudAddressService) Print() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
This is good for debugging, but I would suggest returning a string
so that the caller can dump the information wherever it deems appropriate.
@@ -79,19 +79,29 @@ func (gce *GCECloud) ensureInternalLoadBalancer(clusterName, clusterID string, s | |||
return nil, err | |||
} | |||
|
|||
// Determine IP which will be used for this LB. If no forwarding rule has been established | |||
// or specified in the Service spec, then requestedIP = "". | |||
requestedIP := determineRequestedIP(svc, existingFwdRule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before reaching this point, I assume it's not possible to have an existing forwarding rule with the wrong type of address, e.g., using an EXTERNAL IP. I am relying on the similar logic in external LB.
In the future, if we ever combine external and internal LB code, we'd need to ensure we still check whether the forwarding rule has all the expected attributes (and delete it if not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check to see if the address is RFC 1918? (totally breaks down for IPv6...)
return nil | ||
} | ||
|
||
func (am *addressManager) ensureAddressReservation() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks correct, and I like the useful log messages.
I do wish we could care (a bit) less about the QPS can make the code simpler.
@yujuhong I asked around and found that there is a hard quota of 200. I'm going to stick with releasing the address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commits look good. Just minor nits.
|
||
if addr != nil { | ||
// If address exists, check if the address had the expected attributes. | ||
if (am.targetIP == "" || am.targetIP == addr.Address) && addr.AddressType == string(am.addressType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call validateAddress
here as well.
} | ||
|
||
func (am *addressManager) validateAddress(addr *computebeta.Address) error { | ||
if addr.AddressType != string(am.addressType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also check the IP if the targetIP is not empty.
f1c301c
to
e1d1a12
Compare
/retest |
/test pull-kubernetes-e2e-gce-gpu |
LGTM. |
|
||
glog.V(4).Infof("%v: successfully created address %q which reserved IP %q", am.logPrefix, addr.Name, addr.Address) | ||
return addr.Address, nil | ||
} else if !isHTTPErrorCode(err, http.StatusConflict) && !isHTTPErrorCode(err, http.StatusBadRequest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also get rid of this indent
@@ -82,6 +84,16 @@ func (gce *GCECloud) ReserveAlphaRegionAddress(addr *computealpha.Address, regio | |||
return gce.waitForRegionOp(op, region, mc) | |||
} | |||
|
|||
// ReserveBetaRegionAddress creates a beta region address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why the Beta keyword is in the middle? Shouldn't it be a prefix or suffix?
@@ -79,19 +79,29 @@ func (gce *GCECloud) ensureInternalLoadBalancer(clusterName, clusterID string, s | |||
return nil, err | |||
} | |||
|
|||
// Determine IP which will be used for this LB. If no forwarding rule has been established | |||
// or specified in the Service spec, then requestedIP = "". | |||
requestedIP := determineRequestedIP(svc, existingFwdRule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check to see if the address is RFC 1918? (totally breaks down for IPv6...)
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bowei, nicksardo No associated issue. Update pull-request body to add a reference to an issue, or get approval with The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 51632, 51055, 51676, 51560, 50007) |
What this PR does / why we need it:
This PR adds the ability for the service controller to hold the ILB's IP during sync which may delete/recreate the forwarding rule.
Fixes: #47531
Release note: