kubernetes.io/cluster/CLUSTER_NAME tag missing from EIPs, NAT GWs, S3 buckets and R53 zones #458

joelddiaz · 2018-10-12T17:27:30Z

Version

$ openshift-install version
./bin/openshift-install v0.1.0
Terraform v0.11.8

Platform (aws|libvirt|openshift):

aws

What happened?

After installation, I see some objects are tagged with kubernetes.io/cluster/CLUSTER_NAME and some aren't. Specifically, I've noticed that EIPs, NAT Gateways, S3 buckets, and Route53 zones are missing the above tag.

What you expected to happen?

I expect the tags applied to objects during installation to be consistent.

How to reproduce it (as minimally and precisely as possible)?

1) install a cluster as documented
2) query the tags on any of the EIPs, NAT Gateways, S3 buckets, and Route53 zones that were created during installation.
  For example: aws ec2 describe-nat-gateways --region us-east-1 --nat-gateway-ids nat-00f483ee692b50d54 | jq -r '.NatGateways[0].Tags'
[
  {
    "Key": "tectonicClusterID",
    "Value": "17993b0f-7370-4ee1-9e28-209db9a321d9"
  }
]

wking · 2018-10-12T18:01:36Z

I expect the tags applied to objects during installation to be consistent.

I haven't found upstream docs for the tag, but this suggests that tag is for resources we expect Kubernetes to take over and manage for us. Is that true for all of the resources which currently lack this tag? I think we need to find upstream docs for the tag, and then make the argument for each resource individually.

joelddiaz · 2018-10-12T19:58:07Z

@wking who could we talk to that could shed some light on the appropriateness of adding the kubernetes.io/cluster/CLUSTER_X tag to objects not intended for kubernetes usage.

really, what i'm looking for is "one true tag" so that i could search and see everything in AWS related to CLUSTER_X.

wking · 2018-10-12T20:13:20Z

really, what i'm looking for is "one true tag" so that i could search and see everything in AWS related to CLUSTER_X.

I think tectonicClusterID is that tag. Are there resources missing that tag?

joelddiaz · 2018-10-15T14:30:54Z

yes. any object created by the cluster during runtime (eg PVs, security groups, ELBs) don't get the tectonicClusterID tag.

wking · 2018-10-15T17:18:33Z

any object created by the cluster during runtime (eg PVs, security groups, ELBs) don't get the tectonicClusterID tag.

So what do we need to update so that those get tagged? I guess other repos should be pulling the tag information out of the install-config we push into cluster-config-v1 in the kube-system namespace? We probably want to push a helper function those external projects can vendor that combines UserTags with ClusterID (and anything else?) so they don't have to code that up themselves.

wking · 2018-10-15T18:28:40Z

This (I think) is setting the tag for workers created by the machine-API operator. Do you know who's creating the untagged resources? If not, can you give me more information about them, e.g. what ELB is untagged?

joelddiaz · 2018-10-15T18:39:16Z

@wking for the PVs, it's the kubernetes storage class provisioner

@ironcladlou can you give some details on which code is creating the ELBs/security groups that weren't being found by the uninstaller?

AWS resources created by the cluster (e.g. some elastic load-balancers) are currently missing the tectonicClusterID tag [1], which makes cleanup difficult. This helper makes it easy for those external tools to set the appropriate tags without needing local logic to generate them. [1]: openshift#458 (comment)

ironcladlou · 2018-10-15T19:19:28Z

To reproduce, create a service like:

kind: Service
apiVersion: v1
metadata:
  name: cloud-service
spec:
  type: LoadBalancer
  selector:
    app: some-app
  externalTrafficPolicy: Local
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: http

Here's a starting point upstream for understanding how k8s creates the resources in a given cloud provider to manifest such a service.

wking · 2018-10-15T19:50:52Z

Thanks for the code link :). It looks like folks creating services should be setting service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags on load-balancer services like that if they want them tagged. For anything we control, I think that means we want to add our cluster tags (#465, or something similar after install-config is dropped from the cluster). For things that other folks inject into the cluster, I dunno. Something like the old worker scale-down request we used to have? I'm ok leaving it up to the cluster admins to scale down anything extra they've added before coming back and calling openshift-install destroy-cluster, as long as we do a good job cleaning up our own stuff.

joelddiaz · 2018-10-16T15:37:18Z

@wking so i looked at the annotation and it does indeed work as advertised (it even tagged the security group associated with the load balancer as well). the problem is that end users are free to create the service load balancers, and they are free to leave them untagged. this of course complicates cleaning up the cluster for deprovisioning.

talking this over with some people it was suggested that having an admission server catch the service object creations could then add the annotations as needed. but this functionality doesn't exist today.

i also looked at the options for getting custom tagging working on the PV/storage classes. i didn't find anything like what is possible with the service load balancers.

wking · 2018-10-16T16:30:14Z

... the problem is that end users are free to create the service load balancers, and they are free to leave them untagged. this of course complicates cleaning up the cluster for deprovisioning.

This is not something we can solve completely via tagging. For example, someone with an AWS cluster might create GCE resources for some task. There's no way we're going to have our AWS-cleanup logic reap that for them. I think the solution to this sort of thing is to require users to spin down anything they've created on their own before they call our delete logic. Of course, the more we tag where our cleanup logic can find it, the less they'll have to manually spin down. So I'm in favor of comprehensive tagging. I just think getting user-created resources tagged is a nice-to-have and not a must-have.

On the kubernetes.io/cluster/ front, it looks like an AWS-ism (although that's not a problem if we're only looking at AWS cleanup ;). The tag is from kubernetes/kubernetes#41695. ELB tagging came in via kubernetes/kubernetes#45932. Based on that diff, I'd guess that tagging any resources we create with kubernetes.io/cluster will be ok, but:

Kubernetes may create new EIPs, gateways, etc. without the tag, if it hasn't been specifically taught to apply the tag to that resource type yet. In this case, we're still back to "how do we reap these if the creator isn't tagging them?".
The docs for the tag specify uniqueness within an AZ (which is clearly imprecise). The stale design-proposal suggests that the uniqueness constraint is really "within a VPC", which makes more sense. But things like S3 buckets and R53 zones do not live inside VPCs. What would the uniqueness constraint be then? Requiring cluster names to be unique within an AWS account seems overly strict, but we could always switch to using the UUID ClusterID as our kubernetes.io/cluster/ value, and in that case the uniqueness scope would no longer matter.

joelddiaz · 2018-10-16T18:25:58Z

This is not something we can solve completely via tagging. For example, someone with an AWS cluster might create GCE resources for some task. There's no way we're going to have our AWS-cleanup logic reap that for them. I think the solution to this sort of thing is to require users to spin down anything they've created on their own before they call our delete logic.

It does sound reasonable to insert a step to delete all non-protected projects/namespaces as an early step. But I think this does still leave us exposed if the cluster itself is dysfunctional (ie the PR that we're doing CI against broke the cluster in some strange way).

Kubernetes may create new EIPs, gateways, etc. without the tag, if it hasn't been specifically taught to apply the tag to that resource type yet. In this case, we're still back to "how do we reap these if the creator isn't tagging them?".

I certainly don't have a comprehensive (nor non-comprehensive ;) list of the kinds of objects the cluster can create during normal lifecycle, but I would treat the lack of tagging as a bug and get that component to behave like the rest of the system.

The docs for the tag specify uniqueness within an AZ (which is clearly imprecise). The stale design-proposal suggests that the uniqueness constraint is really "within a VPC", which makes more sense. But things like S3 buckets and R53 zones do not live inside VPCs. What would the uniqueness constraint be then? Requiring cluster names to be unique within an AWS account seems overly strict, but we could always switch to using the UUID ClusterID as our kubernetes.io/cluster/ value, and in that case the uniqueness scope would no longer matter.

I worry for the operations team that would have to look at UUID tags in the AWS console :). I suppose the names of the objects would help the 'browsing the AWS console' usecase, but using the UUID here would certainly avoid the naming collisions.

ironcladlou · 2018-10-16T23:34:54Z

So something to keep in mind is that very soon the cluster-ingress-operator is going to be enabled by default, which means that a LoadBalancer service is going to come out of the box with new AWS installs. This service will be untagged. Here's the service manifest. Questions:

Should the cluster-ingress-operator be actively annotating the service?
If so, from where should the value be obtained (e.g. install-config.clusterID in kube-system/configmaps/cluster-config-v1?)
Is there any precedent for this?

wking · 2018-10-16T23:44:36Z

Should the cluster-ingress-operator be actively annotating the service?

If so, from where should the value be obtained (e.g. install-config.clusterID in kube-system/configmaps/cluster-config-v1?)

#465 was my take for "please annotate, and get the annotations from InstallConfig.Tags()". But see push-back there like this. I'm not sure what the current long- or short-term alternatives are.

Is there any precedent for this?

I'm new to this space, but surely someone has had to unwind a cluster on AWS before now, right?

ironcladlou · 2018-10-16T23:49:53Z

I'm new to this space, but surely someone has had to unwind a cluster on AWS before now, right?

Well, in the context of our new installer and operators, I guess not, since I've been the one discovering and reporting these issues. 😁

dgoodwin · 2018-10-17T14:38:28Z

Broken clusters get deleted, often, so I'm hesitant to even bother with the effort of cleanup via unwinding over the Kube API. This doesn't seem like it will be reliable enough to solve the problem.

Falling back to uninstall by OR'ing tags would be my preferred backup plan, though we will return to needing a way to pre-specify cluster UUID to the installer, but that should be minor.

However do we have a compelling reason why we cannot tag all resources with the kubernetes.io/cluster/NAME? The comment on the tag definition here https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/tags.go#L34 indicates that it's just to differentiate multiple clusters in one account, in which case our use case here seems acceptable. This feels like best available option, more resilient and simple cleanup, less UUID concerns, a lot less work than API driven unwind.

wking · 2018-10-17T16:53:37Z

Falling back to uninstall by OR'ing tags would be my preferred backup plan, though we will return to needing a way to pre-specify cluster UUID to the installer, but that should be minor.

Why do we need a way to pre-specify cluster UUID? Currently the installer generates a UUID and pushes it into the cluster for operators to find and use. What does a pre-generated UUID give you beyond that?

However do we have a compelling reason why we cannot tag all resources with the kubernetes.io/cluster/NAME?

We have concerns about the uniqueness scope of the cluster name. I'm ok pushing kubernetes.io/cluster/UUID everywhere, but @joelddiaz has concerns about debugging with human-opaque UUIDs. Maybe not so much that it blocks that approach though ;). And if folks want to just assume that cluster names are unique enough, I'm ok giving that a shot too. However, without something like #465, we're still missing a way to get expirationDate onto cluster resources, which is our current backstop for the installer failing to fully remove the cluster (openshift/release#1761).

bparees · 2018-10-17T17:03:53Z

/cc @legionus @dmage @coreydaley this is something we need to consider when creating storage buckets for the registry on s3+gcs.

dgoodwin · 2018-10-17T17:13:55Z

Why do we need a way to pre-specify cluster UUID? Currently the installer generates a UUID and pushes it into the cluster for operators to find and use. What does a pre-generated UUID give you beyond that?

Hive runs the installer in a pod, it may fail and have partially created a cluster before we were able to push the metadata/UUID back to the API server. It is not idempotent and if you just re-run again, you get resources recreated and a big mess to clean up. Per last arch call we have to cleanup any extraneous resources for a cluster before we launch an install again. If we can provide the UUID, we no longer have a situation where the install fails before we can get the UUID, or the UUID upload itself fails, we always know what we need to clean up and can retry reliably. However this is not required if we can tag everything with the kube cluster ID.

We have concerns about the uniqueness scope of the cluster name. I'm ok pushing kubernetes.io/cluster/UUID everywhere, but @joelddiaz has concerns about debugging with human-opaque UUIDs. Maybe not so much that it blocks that approach though ;). And if folks want to just assume that cluster names are unique enough, I'm ok giving that a shot too. However, without something like #465, we're still missing a way to get expirationDate onto cluster resources, which is our current backstop for the installer failing to fully remove the cluster (openshift/release#1761).

I think we should be safe to work on the assumption that cluster names are unique enough within one AWS account. We're going to have some safeguards to prevent problems in Hive, and it probably would be possible to have the installer try to prevent this as well by looking for similarly tagged resources in the account.

wking · 2018-10-17T19:58:31Z

If we can provide the UUID, we no longer have a situation where the install fails before we can get the UUID, or the UUID upload itself fails, we always know what we need to clean up and can retry reliably.

Depending on how robust you need to be, there are at least two ways to address this:

(Easy, brittle) Have the installer write to a directory you're going to keep (like this). Since pkg/asset/cluster: Pull metadata into the Cluster asset #454 we write the metadata.json you need to clean up even when the Terraform install fails. The hole here would be if the container completely crashed before the installer wrote metadata.json.
(Moderate, robust) Use install-config to spit out the InstallConfig with the cluster ID:
```
$ openshift-install --dir wking install-config
$ grep clusterID wking/install-config.yml 
clusterID: 0410ff22-437d-4dbc-960b-0ffa90cfeb46
```
Save that cluster ID somewhere. Then run the cluster subcommand to launch the cluster. Since Save assets in a state file on disk #388, the installer will pick back up where it left off, reusing it's previously-generated cluster ID (and an other previously-created resources) when it creates the cluster.

We're going to have some safeguards to prevent problems in Hive, and it probably would be possible to have the installer try to prevent this as well by looking for similarly tagged resources in the account.

Looking for an existing cluster with a given name, not finding it, and then creating a cluster with that name is going to be racy. Is there a way to lock an AWS account against cluster creation to prevent two folks from trying to simultaneously create a.example.com and a.example.org in the same account? I'd much rather stick to UUIDs for this sort of thing, where the likelihood of collision is vanishingly small. Or use names and just tell users that they get to police collisions on their own and keep the pieces if they collide.

wking · 2018-10-17T20:10:22Z

Earlier today, @abhinavdahiya was asking for specifics on whether the destroy leakage caused by missing tags was happening now. @ironcladlou says they're seeing it in CI for both openshift/cluster-ingress-operator#52 and openshift/cluster-dns-operator#38.

abhinavdahiya · 2018-10-17T20:20:04Z

If we want to use kubernetes.io/cluster/<cluster-name>, and still solve some of the @wking 's concern.

We can add sematics like the cluster-name is <human-readable>-<random 5 chars> like pods for a replica set.

We can program the dns names and other user oriented stuff use only the human-readable part?

dgoodwin · 2018-10-18T11:37:46Z

Provided it is configured in the cluster provider so all post-install LBs PVs etc get tagged with this as their cluster ID, clustername-random would be pretty good from my POV. It's possible it will end up clustername-random-random as GenerateName may be used for some clusters created in Hive, we're not sure if Dedicated will need to use this yet or not, but either way it should be fine.

This would give us the one cluster ID that can be used to fully deprovision.

Is the tectonicClusterUUID necessary anymore with this in play?

wking · 2018-10-18T18:20:45Z

We can add sematics like the cluster-name is <human-readable>-<random 5 chars> like pods for a replica set.

That could work, although 5 characters is not that big a space ;). What about <human-name>-<uuid> as the cluster ID? Although that would make the clustername-random-random that @dgoodwin points out pretty long ;).

As an example of name-uniqueness collisions in real life, @ironcladlou and @sallyom have both hit:

DEBUG error getting tags for bucket %!s(*string=0xc4207d0268): BucketRegionError: incorrect region, the bucket is not in 'us-east-2' region
       status code: 301, request id: , host id: , skipping...

when switching regions within a single AWS account (and presumably not updating their cluster name). @sallyom said:

... it takes some time to clean up all remnants of a bucket in a region, so if you wait and try again, that error will clear (I'm not sure how long, you can google and see more info, it's a known issue)...

Is the tectonicClusterUUID necessary anymore with this in play?

kubernetes.io/cluster/ is an AWS-ism. Do other platforms need a cluster-resource tag? If so, should we start using kubernetes.io/cluster/ on them as well?

wking · 2018-10-18T18:23:00Z

For the S3 issue specifically, I think we can just drop the bucket name. But it's an example of the same cluster name being used for two regions in the same account.

abhinavdahiya · 2018-10-18T18:46:36Z

As an example of name-uniqueness collisions in real life, @ironcladlou and @sallyom have both hit:

It is not cluster name collision, see the commit for details openshift/hive@32cde50

That could work, although 5 characters is not that big a space ;).

it will be fine :P we want enough uniqueness that single user does not collide.

joelddiaz · 2018-10-18T19:10:47Z

What has been decided on this? Do I update the uninstaller to key off of kubernetes.io/cluster, or is there another approach we should be taking with the uninstaller that can capture resources created inside and outside of the installer?

dgoodwin · 2018-10-18T19:30:51Z

Agreed appending a 5 char suffix should be more than sufficient for colliding cluster names in one account, which you shouldn't be doing at all. Can we move forward with this @wking ?

If so, can we pass in the value or do you want to generate it? Will it appear in the metadata file?

Should it fully replace tectonic UUID that is there today?

wking · 2018-10-18T22:48:55Z

After discussing today with @abhinavdahiya, we think a reasonable approach would be to update the hive package to remove resources matching either the kubernetes.io/cluster/... tag or the tectonicClusterID tag. We're ok punting randomized name suffixes to future work. And we're ok punting "how to apply other user-specified tags" (#465) to potential future work. Not super elegant, but it should unstick the current deletion issues while we sort out a long-term plan.

I've started in on converting the hive code from AND-ing the tags to OR-ing the tags, and have a few minor clean-up PRs in the pipe so far. Does this sound reasonable?

wking · 2018-10-19T00:01:41Z

Ok, openshift/hive#47 has a moderately-ugly (but hopefully working ;) OR implementation. Once that lands (if the approach looks sound), I can update the installer to delete resources matching either kubernetes.io/cluster/... or tectonicClusterID.

abhinavdahiya · 2018-10-19T06:25:37Z

@wking what do you think about calling hive destroy twice in installer? One for tectonicclusterid, another sweep for kubernetes.io/cluster/clustername: owned

wking · 2018-10-19T11:06:23Z

what do you think about calling hive destroy twice in installer? One for tectonicclusterid, another sweep for kubernetes.io/cluster/clustername: owned

They need to be simultaneous, in case a resource tagged one way blocks deletion of a resource tagged the other way. You could do that with goroutines here, but I prefer openshift/hive#47, because it allows for future optimization (the filterObjects resources could handle both AND and OR in a single goroutine).

…deletion We're having trouble getting our resources tagged consistently, with elastic IPs, NAT gateways, etc. missing kubernetes.io/cluster/<name> [1]. Meanwhile, the ingress operator will be creating elastic load balancers that lack tectonicClusterID [2]. This change deletes everything we can find with either tag while we sort out something cleaner. [1]: openshift#458 (comment) [2]: openshift#458 (comment)

joelddiaz · 2018-10-25T16:18:18Z

closing. this was solved by adding/changing the functionality of the uninstaller to receive a list of key=values and attempting deletes on any matching tags openshift/hive#47

wking · 2018-10-25T16:28:19Z

Reopening, because I think openshift/hive#47 and #535 are more of a temporary stopgap than a permanent fix ;). We should figure out a consistent way to tag AWS resources and get back to a single approach. And we should figure out a way to set the user-specified tags on our resources too (#465 or similar).

add an API for machinepool -> machinesets

eparis · 2019-02-19T20:50:27Z

I believe we finished making sure everything that can be tagged is tagged. So closing.

wking mentioned this issue Oct 15, 2018

pkg/types/installconfig: Add an (*InstallConfig).Tags() helper #465

Closed

wking mentioned this issue Oct 17, 2018

Tag master machines for adoption by machine controller #479

Merged

wking mentioned this issue Oct 18, 2018

data/aws: Drop the bucket name #489

Merged

wking mentioned this issue Oct 25, 2018

pkg/asset/cluster: tectonicClusterID or kubernetes.io/cluster/<name> deletion #535

Merged

joelddiaz closed this as completed Oct 25, 2018

wking reopened this Oct 25, 2018

wking mentioned this issue Oct 25, 2018

Requesting Removal of InstallConfig ClusterID #538

Closed

dgoodwin referenced this issue Nov 5, 2018

Merge pull request #573 from abhinavdahiya/mp_to_ms

58486f9

add an API for machinepool -> machinesets

joelddiaz mentioned this issue Nov 14, 2018

Manage routing wildcard domain openshift/cluster-ingress-operator#59

Merged

wking mentioned this issue Nov 26, 2018

Clarify Cluster ID, UUID, Name in API openshift/hive#106

Closed

brianredbeard mentioned this issue Jan 4, 2019

Allow for allocation of custom key/value pair for tagging #999

Closed

wking added the platform/aws label Jan 11, 2019

eparis closed this as completed Feb 19, 2019

kubernetes.io/cluster/CLUSTER_NAME tag missing from EIPs, NAT GWs, S3 buckets and R53 zones #458

kubernetes.io/cluster/CLUSTER_NAME tag missing from EIPs, NAT GWs, S3 buckets and R53 zones #458

Comments

joelddiaz commented Oct 12, 2018

Version

Platform (aws|libvirt|openshift):

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

wking commented Oct 12, 2018 • edited Loading

joelddiaz commented Oct 12, 2018

wking commented Oct 12, 2018

joelddiaz commented Oct 15, 2018 • edited Loading

wking commented Oct 15, 2018

wking commented Oct 15, 2018

joelddiaz commented Oct 15, 2018

ironcladlou commented Oct 15, 2018

wking commented Oct 15, 2018

joelddiaz commented Oct 16, 2018

wking commented Oct 16, 2018

joelddiaz commented Oct 16, 2018

ironcladlou commented Oct 16, 2018

wking commented Oct 16, 2018 • edited Loading

ironcladlou commented Oct 16, 2018

dgoodwin commented Oct 17, 2018

wking commented Oct 17, 2018 • edited Loading

bparees commented Oct 17, 2018

dgoodwin commented Oct 17, 2018

wking commented Oct 17, 2018

wking commented Oct 17, 2018

abhinavdahiya commented Oct 17, 2018

dgoodwin commented Oct 18, 2018 • edited Loading

wking commented Oct 18, 2018

wking commented Oct 18, 2018

abhinavdahiya commented Oct 18, 2018 • edited Loading

joelddiaz commented Oct 18, 2018 • edited Loading

dgoodwin commented Oct 18, 2018

wking commented Oct 18, 2018

wking commented Oct 19, 2018

abhinavdahiya commented Oct 19, 2018

wking commented Oct 19, 2018

joelddiaz commented Oct 25, 2018

wking commented Oct 25, 2018

eparis commented Feb 19, 2019

wking commented Oct 12, 2018 •

edited

Loading

joelddiaz commented Oct 15, 2018 •

edited

Loading

wking commented Oct 16, 2018 •

edited

Loading

wking commented Oct 17, 2018 •

edited

Loading

dgoodwin commented Oct 18, 2018 •

edited

Loading

abhinavdahiya commented Oct 18, 2018 •

edited

Loading

joelddiaz commented Oct 18, 2018 •

edited

Loading