Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE: Reserve address for ILBs during sync #51055

Merged
merged 1 commit into from
Sep 1, 2017

Conversation

nicksardo
Copy link
Contributor

@nicksardo nicksardo commented Aug 21, 2017

What this PR does / why we need it:
This PR adds the ability for the service controller to hold the ILB's IP during sync which may delete/recreate the forwarding rule.

Fixes: #47531

Release note:

GCE: Internal load balancer IPs are now reserved during service sync to prevent losing the address to another service.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/network Categorizes an issue or PR as relevant to SIG Network. area/platform/gce labels Aug 21, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 21, 2017
@nicksardo
Copy link
Contributor Author

FYI @yujuhong. This PR isn't done, but showing you what the ILB address logic will look like.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 22, 2017
@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 22, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2017
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 25, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2017
@nicksardo
Copy link
Contributor Author

/unassign
/assign @yujuhong @bowei

@k8s-ci-robot k8s-ci-robot assigned bowei and yujuhong and unassigned nicksardo Aug 25, 2017
type addressManager struct {
logPrefix string
svc CloudAddressService
lbName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this name or addrName. The fact that the controller chooses to use the lbNamd for all related resources seemed irrelevant to this manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make it clear that this manager would only manage (promote, delete) the address with this name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// If the retrieved address is not named with the loadbalancer name, then the controller does not own it.
if addr.Name != am.lbName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably make this clear by adding a method. It'd be easier to change in the future too.

func (am *addressManager) isManagedAddress(addr *computebeta.Address) {
  return addr.Name == am.Name
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

glog.V(3).Infof("%v: not attempting release of address %v.", am.logPrefix, am.targetIP)
return nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are willing to spend an extra GET call, we can check whether the address is IN_USE. If yes, it's safe release it. If not, this IP may be lost forever. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't have to keep the tryRelease field. Just GET and return if not found.

err := am.svc.DeleteRegionAddress(am.lbName, am.region)
if err != nil {
if isNotFound(err) {
glog.Warningf("%v: address %q was not found. Ignoring.", am.logPrefix, am.targetIP, am.lbName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove am.targetIP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

Subnetwork: am.subnetURL,
}

err := am.svc.ReserveBetaRegionAddress(newAddr, am.region)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (optional). I like less indents.

if err == nil {
    return nil
}
if !isHTTPErrorCode(err, http.StatusConflict) {
   return err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


computebeta "google.golang.org/api/compute/v0.beta"
compute "google.golang.org/api/compute/v1"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider using github.com/stretchr/testify/assert (for Errorf) and github.com/stretchr/testify/require (for Fatalf). They tend to make the code more succinct .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of testify/assert for personal projects, but haven't seen enormous use of it here in K8s. Happy to use it unless someone says otherwise.

GetRegionAddressByIP(region, ipAddress string) (*compute.Address, error)
// TODO: Mock `DeleteRegionAddress(name, region string) endpoint
// TODO: Mock Global endpoints
DeleteRegionAddress(name, region string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops... I added the same function in another PR. More rebasing...

targetIP := determineTargetIP(svc, existingFwdRule)

var addrMgr *addressManager
if targetIP != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. The external LB ensures a static IP (i.e., creating a new one) even if targetIP == "" because many other components (e.g., firewall, targetpool) needs a target IP. I guess this not the case for ILB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The firewall and target pool only use the IP for providing some extra info in the description. On first sync the description IP would be empty; but on second sync, it would be filled. I didn't see a need to perform the extra 2 API calls when the forwarding rule will just generate an IP itself, but I'll change the logic to have the description set right from the first sync.


// Reserving the address failed due to a conflict. The address manager just checked that no address
// exists with the lbName, so it may belong to the user.
addr, err := am.svc.GetBetaRegionAddressByIP(am.region, am.targetIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the controller change but didn't see where you verify the AddressType of user-owned address.

// The addrMgr does not exist if no targetIP was specified. This function was not deferred
// because the code should continue to hold the targetIP if the sync fails for any reason.
// At this point, the existing rule is known to exist; therefore, releasing the address is fine.
if addrMgr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not read the entire ILB file, just want to ask whether you need to defer the ReleaseAddress to reduce the time the controller holds the static IP. Perhaps the IP quota is not a concern for internal IPs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, we discussed this case offline. If the service sync fails for any reason, I thought you said we should hold onto the IP for the next sync, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should hold on to the IP if the forwarding rule is not using it. If the forwarding rule IS using the IP, it's safe to release the it since we can always get it back. This is what the external LB do today for optimization. If there is no need for ILB to do so, we don't have to add more complexity.


err := am.svc.ReserveBetaRegionAddress(newAddr, am.region)
if err != nil {
if !isHTTPErrorCode(err, http.StatusConflict) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is not correct. I just found out that if you try to reserve an address with an IP that's already static (but not using the same name), you get a 400 error.

Error 400: Invalid value for field 'resource.address': '1.2.3.4'. Specified IP address is already reserved., invalid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't think the current external LB controller handle some corner cases because of this....
E.g., if the fwd rule is using an IP that's not specified in the Service object, but is actually owned by the user (e.g., not using the lbName), the controller would just fail to ensure the IP.

Copy link
Contributor Author

@nicksardo nicksardo Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I just tested this again and got 409. Wonder if there's an alpha vs beta difference

{
 "kind": "compute#operation",
 "id": "1599788789319664192",
 "name": "operation-1503706287098-5579ce716c890-3262ae10-677fa2b7",
 "operationType": "insert",
 "targetLink": "https://www.googleapis.com/compute/beta/projects/XYZ/regions/us-central1/addresses/myip2",
 "targetId": "5223319394054898240",
 "status": "DONE",
 "progress": 100,
 "insertTime": "2017-08-25T17:11:27.558-07:00",
 "startTime": "2017-08-25T17:11:28.371-07:00",
 "endTime": "2017-08-25T17:11:31.465-07:00",
 "error": {
  "errors": [
   {
    "code": "IP_IN_USE_BY_ANOTHER_RESOURCE",
    "message": "IP '10.0.0.10' is already being used by another resource. "
   }
  ]
 },
 "httpErrorStatusCode": 409,
 "httpErrorMessage": "CONFLICT",
 "selfLink": "https://www.googleapis.com/compute/beta/projects/XYZ/regions/us-central1/operations/operation-1503706287098-5579ce716c890-3262ae10-677fa2b7",
 "region": "https://www.googleapis.com/compute/beta/projects/XYZ/regions/us-central1"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested more and it's not about alpha/beta/v1. The function returns 409 if AddressType == INTERNAL. Otherwise, it always returns 400. (sigh...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take this up with the appropriate team - it should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling both

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 29, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 29, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 29, 2017
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 29, 2017
@freehan freehan added this to the v1.8 milestone Aug 29, 2017
Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall with mainly some nits.

FWIW, I tried with my GCP project and reserved internal address doesn't seem to take up any quota of static IP addresses.

}

// HoldAddress will ensure that the IP is reserved with an address - either owned by the controller
// or by a user. If the address is not the loadBalancerName, then it's assumed to be a user's address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/loadBalancerName/addressManager.name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document that the returned string is the IP address reserved.

return addr.Address, nil
}

glog.V(3).Infof("%v: existing address %q has IP %q Type %q which does not match targeted IP %q Type %q. Attemping to delete.", am.logPrefix, addr.Name, addr.Address, addr.AddressType, am.targetIP, am.addressType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this V(2). Would be useful for debugging GKE issues.

// could be reserving another address; therefore, it would need to be deleted. In the normal
// case of using a controller address, retrieving the address by name results in the fewest API
// calls since it indicates whether a Delete is necessary before Reserve.
glog.V(3).Infof("%v: attempting hold of IP %q with type %q", am.logPrefix, am.targetIP, am.addressType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why using V(3) for the majority of the logs? Just curious because I've seen mostly V(2) and V(4) logs in this package

}

// Check that the address attributes are as required.
if addr.AddressType != string(am.addressType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: factor this out to a function since the manager checks essentially the same thing in line 71. This ensures that they stay consistent. E.g.,

func (am *addressManager) validateAddress(addr *computebeta.Address) error {
...
}

loadBalancerName := "a111111111111111"
forwardingRuleAddr := "1.1.1.1"
loadBalancerIP := ""
targetIP := getTargetIP(forwardingRuleAddr, loadBalancerIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using this function? AddressManager doesn't care what targetIP you choose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (optional): the only difference between this test and the previous test is that the targetIP is non-empty. Could consider merging the two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is moving the code after HoldAddress to a separate function to reuse across tests.

assert.NotEmpty(t, ipToUse)

addr, err = svc.GetBetaRegionAddress(loadBalancerName, region)
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.NoError(t, err)

assert.NoError(t, err)
assert.NotEmpty(t, ipToUse)

addr, err = svc.GetBetaRegionAddress(loadBalancerName, region)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use GetRegionAddress in other tests, why using Beta here?

return &addr
}

func (cas *FakeCloudAddressService) Print() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

This is good for debugging, but I would suggest returning a string so that the caller can dump the information wherever it deems appropriate.

@@ -79,19 +79,29 @@ func (gce *GCECloud) ensureInternalLoadBalancer(clusterName, clusterID string, s
return nil, err
}

// Determine IP which will be used for this LB. If no forwarding rule has been established
// or specified in the Service spec, then requestedIP = "".
requestedIP := determineRequestedIP(svc, existingFwdRule)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reaching this point, I assume it's not possible to have an existing forwarding rule with the wrong type of address, e.g., using an EXTERNAL IP. I am relying on the similar logic in external LB.

In the future, if we ever combine external and internal LB code, we'd need to ensure we still check whether the forwarding rule has all the expected attributes (and delete it if not).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check to see if the address is RFC 1918? (totally breaks down for IPv6...)

return nil
}

func (am *addressManager) ensureAddressReservation() (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks correct, and I like the useful log messages.

I do wish we could care (a bit) less about the QPS can make the code simpler.

@nicksardo
Copy link
Contributor Author

@yujuhong I asked around and found that there is a hard quota of 200. I'm going to stick with releasing the address.

Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commits look good. Just minor nits.


if addr != nil {
// If address exists, check if the address had the expected attributes.
if (am.targetIP == "" || am.targetIP == addr.Address) && addr.AddressType == string(am.addressType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call validateAddress here as well.

}

func (am *addressManager) validateAddress(addr *computebeta.Address) error {
if addr.AddressType != string(am.addressType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check the IP if the targetIP is not empty.

@nicksardo nicksardo force-pushed the gce-ilb-ip branch 5 times, most recently from f1c301c to e1d1a12 Compare August 30, 2017 06:39
@nicksardo
Copy link
Contributor Author

/retest

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2017
@nicksardo
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-gpu

@yujuhong
Copy link
Contributor

LGTM.


glog.V(4).Infof("%v: successfully created address %q which reserved IP %q", am.logPrefix, addr.Name, addr.Address)
return addr.Address, nil
} else if !isHTTPErrorCode(err, http.StatusConflict) && !isHTTPErrorCode(err, http.StatusBadRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also get rid of this indent

@@ -82,6 +84,16 @@ func (gce *GCECloud) ReserveAlphaRegionAddress(addr *computealpha.Address, regio
return gce.waitForRegionOp(op, region, mc)
}

// ReserveBetaRegionAddress creates a beta region address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why the Beta keyword is in the middle? Shouldn't it be a prefix or suffix?

@@ -79,19 +79,29 @@ func (gce *GCECloud) ensureInternalLoadBalancer(clusterName, clusterID string, s
return nil, err
}

// Determine IP which will be used for this LB. If no forwarding rule has been established
// or specified in the Service spec, then requestedIP = "".
requestedIP := determineRequestedIP(svc, existingFwdRule)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check to see if the address is RFC 1918? (totally breaks down for IPv6...)

@bowei
Copy link
Member

bowei commented Aug 30, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 30, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bowei, nicksardo

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 30, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51632, 51055, 51676, 51560, 50007)

@k8s-github-robot k8s-github-robot merged commit 324a3bf into kubernetes:master Sep 1, 2017
@nicksardo nicksardo deleted the gce-ilb-ip branch January 12, 2018 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/gce cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants