Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes.io/cluster/CLUSTER_NAME tag missing from EIPs, NAT GWs, S3 buckets and R53 zones #458

Closed
joelddiaz opened this issue Oct 12, 2018 · 35 comments

Comments

@joelddiaz
Copy link

Version

$ openshift-install version
./bin/openshift-install v0.1.0
Terraform v0.11.8

Platform (aws|libvirt|openshift):

aws

What happened?

After installation, I see some objects are tagged with kubernetes.io/cluster/CLUSTER_NAME and some aren't. Specifically, I've noticed that EIPs, NAT Gateways, S3 buckets, and Route53 zones are missing the above tag.

What you expected to happen?

I expect the tags applied to objects during installation to be consistent.

How to reproduce it (as minimally and precisely as possible)?

1) install a cluster as documented
2) query the tags on any of the EIPs, NAT Gateways, S3 buckets, and Route53 zones that were created during installation.
  For example: aws ec2 describe-nat-gateways --region us-east-1 --nat-gateway-ids nat-00f483ee692b50d54 | jq -r '.NatGateways[0].Tags'
[
  {
    "Key": "tectonicClusterID",
    "Value": "17993b0f-7370-4ee1-9e28-209db9a321d9"
  }
]
@wking
Copy link
Member

wking commented Oct 12, 2018

I expect the tags applied to objects during installation to be consistent.

I haven't found upstream docs for the tag, but this suggests that tag is for resources we expect Kubernetes to take over and manage for us. Is that true for all of the resources which currently lack this tag? I think we need to find upstream docs for the tag, and then make the argument for each resource individually.

@joelddiaz
Copy link
Author

@wking who could we talk to that could shed some light on the appropriateness of adding the kubernetes.io/cluster/CLUSTER_X tag to objects not intended for kubernetes usage.

really, what i'm looking for is "one true tag" so that i could search and see everything in AWS related to CLUSTER_X.

@wking
Copy link
Member

wking commented Oct 12, 2018

really, what i'm looking for is "one true tag" so that i could search and see everything in AWS related to CLUSTER_X.

I think tectonicClusterID is that tag. Are there resources missing that tag?

@joelddiaz
Copy link
Author

joelddiaz commented Oct 15, 2018

yes. any object created by the cluster during runtime (eg PVs, security groups, ELBs) don't get the tectonicClusterID tag.

@wking
Copy link
Member

wking commented Oct 15, 2018

any object created by the cluster during runtime (eg PVs, security groups, ELBs) don't get the tectonicClusterID tag.

So what do we need to update so that those get tagged? I guess other repos should be pulling the tag information out of the install-config we push into cluster-config-v1 in the kube-system namespace? We probably want to push a helper function those external projects can vendor that combines UserTags with ClusterID (and anything else?) so they don't have to code that up themselves.

@wking
Copy link
Member

wking commented Oct 15, 2018

This (I think) is setting the tag for workers created by the machine-API operator. Do you know who's creating the untagged resources? If not, can you give me more information about them, e.g. what ELB is untagged?

@joelddiaz
Copy link
Author

@wking for the PVs, it's the kubernetes storage class provisioner

@ironcladlou can you give some details on which code is creating the ELBs/security groups that weren't being found by the uninstaller?

wking added a commit to wking/openshift-installer that referenced this issue Oct 15, 2018
AWS resources created by the cluster (e.g. some elastic
load-balancers) are currently missing the tectonicClusterID tag [1],
which makes cleanup difficult.  This helper makes it easy for those
external tools to set the appropriate tags without needing local logic
to generate them.

[1]: openshift#458 (comment)
@ironcladlou
Copy link
Contributor

To reproduce, create a service like:

kind: Service
apiVersion: v1
metadata:
  name: cloud-service
spec:
  type: LoadBalancer
  selector:
    app: some-app
  externalTrafficPolicy: Local
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: http

Here's a starting point upstream for understanding how k8s creates the resources in a given cloud provider to manifest such a service.

@wking
Copy link
Member

wking commented Oct 15, 2018

Thanks for the code link :). It looks like folks creating services should be setting service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags on load-balancer services like that if they want them tagged. For anything we control, I think that means we want to add our cluster tags (#465, or something similar after install-config is dropped from the cluster). For things that other folks inject into the cluster, I dunno. Something like the old worker scale-down request we used to have? I'm ok leaving it up to the cluster admins to scale down anything extra they've added before coming back and calling openshift-install destroy-cluster, as long as we do a good job cleaning up our own stuff.

@joelddiaz
Copy link
Author

@wking so i looked at the annotation and it does indeed work as advertised (it even tagged the security group associated with the load balancer as well). the problem is that end users are free to create the service load balancers, and they are free to leave them untagged. this of course complicates cleaning up the cluster for deprovisioning.

talking this over with some people it was suggested that having an admission server catch the service object creations could then add the annotations as needed. but this functionality doesn't exist today.

i also looked at the options for getting custom tagging working on the PV/storage classes. i didn't find anything like what is possible with the service load balancers.

@wking
Copy link
Member

wking commented Oct 16, 2018

... the problem is that end users are free to create the service load balancers, and they are free to leave them untagged. this of course complicates cleaning up the cluster for deprovisioning.

This is not something we can solve completely via tagging. For example, someone with an AWS cluster might create GCE resources for some task. There's no way we're going to have our AWS-cleanup logic reap that for them. I think the solution to this sort of thing is to require users to spin down anything they've created on their own before they call our delete logic. Of course, the more we tag where our cleanup logic can find it, the less they'll have to manually spin down. So I'm in favor of comprehensive tagging. I just think getting user-created resources tagged is a nice-to-have and not a must-have.

On the kubernetes.io/cluster/ front, it looks like an AWS-ism (although that's not a problem if we're only looking at AWS cleanup ;). The tag is from kubernetes/kubernetes#41695. ELB tagging came in via kubernetes/kubernetes#45932. Based on that diff, I'd guess that tagging any resources we create with kubernetes.io/cluster will be ok, but:

  • Kubernetes may create new EIPs, gateways, etc. without the tag, if it hasn't been specifically taught to apply the tag to that resource type yet. In this case, we're still back to "how do we reap these if the creator isn't tagging them?".

  • The docs for the tag specify uniqueness within an AZ (which is clearly imprecise). The stale design-proposal suggests that the uniqueness constraint is really "within a VPC", which makes more sense. But things like S3 buckets and R53 zones do not live inside VPCs. What would the uniqueness constraint be then? Requiring cluster names to be unique within an AWS account seems overly strict, but we could always switch to using the UUID ClusterID as our kubernetes.io/cluster/ value, and in that case the uniqueness scope would no longer matter.

@joelddiaz
Copy link
Author

This is not something we can solve completely via tagging. For example, someone with an AWS cluster might create GCE resources for some task. There's no way we're going to have our AWS-cleanup logic reap that for them. I think the solution to this sort of thing is to require users to spin down anything they've created on their own before they call our delete logic.

It does sound reasonable to insert a step to delete all non-protected projects/namespaces as an early step. But I think this does still leave us exposed if the cluster itself is dysfunctional (ie the PR that we're doing CI against broke the cluster in some strange way).

Kubernetes may create new EIPs, gateways, etc. without the tag, if it hasn't been specifically taught to apply the tag to that resource type yet. In this case, we're still back to "how do we reap these if the creator isn't tagging them?".

I certainly don't have a comprehensive (nor non-comprehensive ;) list of the kinds of objects the cluster can create during normal lifecycle, but I would treat the lack of tagging as a bug and get that component to behave like the rest of the system.

The docs for the tag specify uniqueness within an AZ (which is clearly imprecise). The stale design-proposal suggests that the uniqueness constraint is really "within a VPC", which makes more sense. But things like S3 buckets and R53 zones do not live inside VPCs. What would the uniqueness constraint be then? Requiring cluster names to be unique within an AWS account seems overly strict, but we could always switch to using the UUID ClusterID as our kubernetes.io/cluster/ value, and in that case the uniqueness scope would no longer matter.

I worry for the operations team that would have to look at UUID tags in the AWS console :). I suppose the names of the objects would help the 'browsing the AWS console' usecase, but using the UUID here would certainly avoid the naming collisions.

@ironcladlou
Copy link
Contributor

So something to keep in mind is that very soon the cluster-ingress-operator is going to be enabled by default, which means that a LoadBalancer service is going to come out of the box with new AWS installs. This service will be untagged. Here's the service manifest. Questions:

  1. Should the cluster-ingress-operator be actively annotating the service?
  2. If so, from where should the value be obtained (e.g. install-config.clusterID in kube-system/configmaps/cluster-config-v1?)
  3. Is there any precedent for this?

@wking
Copy link
Member

wking commented Oct 16, 2018

  1. Should the cluster-ingress-operator be actively annotating the service?
  2. If so, from where should the value be obtained (e.g. install-config.clusterID in kube-system/configmaps/cluster-config-v1?)

#465 was my take for "please annotate, and get the annotations from InstallConfig.Tags()". But see push-back there like this. I'm not sure what the current long- or short-term alternatives are.

  1. Is there any precedent for this?

I'm new to this space, but surely someone has had to unwind a cluster on AWS before now, right?

@ironcladlou
Copy link
Contributor

I'm new to this space, but surely someone has had to unwind a cluster on AWS before now, right?

Well, in the context of our new installer and operators, I guess not, since I've been the one discovering and reporting these issues. 😁

@dgoodwin
Copy link
Contributor

Broken clusters get deleted, often, so I'm hesitant to even bother with the effort of cleanup via unwinding over the Kube API. This doesn't seem like it will be reliable enough to solve the problem.

Falling back to uninstall by OR'ing tags would be my preferred backup plan, though we will return to needing a way to pre-specify cluster UUID to the installer, but that should be minor.

However do we have a compelling reason why we cannot tag all resources with the kubernetes.io/cluster/NAME? The comment on the tag definition here https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/tags.go#L34 indicates that it's just to differentiate multiple clusters in one account, in which case our use case here seems acceptable. This feels like best available option, more resilient and simple cleanup, less UUID concerns, a lot less work than API driven unwind.

@wking
Copy link
Member

wking commented Oct 17, 2018

Falling back to uninstall by OR'ing tags would be my preferred backup plan, though we will return to needing a way to pre-specify cluster UUID to the installer, but that should be minor.

Why do we need a way to pre-specify cluster UUID? Currently the installer generates a UUID and pushes it into the cluster for operators to find and use. What does a pre-generated UUID give you beyond that?

However do we have a compelling reason why we cannot tag all resources with the kubernetes.io/cluster/NAME?

We have concerns about the uniqueness scope of the cluster name. I'm ok pushing kubernetes.io/cluster/UUID everywhere, but @joelddiaz has concerns about debugging with human-opaque UUIDs. Maybe not so much that it blocks that approach though ;). And if folks want to just assume that cluster names are unique enough, I'm ok giving that a shot too. However, without something like #465, we're still missing a way to get expirationDate onto cluster resources, which is our current backstop for the installer failing to fully remove the cluster (openshift/release#1761).

@bparees
Copy link
Contributor

bparees commented Oct 17, 2018

/cc @legionus @dmage @coreydaley this is something we need to consider when creating storage buckets for the registry on s3+gcs.

@dgoodwin
Copy link
Contributor

Why do we need a way to pre-specify cluster UUID? Currently the installer generates a UUID and pushes it into the cluster for operators to find and use. What does a pre-generated UUID give you beyond that?

Hive runs the installer in a pod, it may fail and have partially created a cluster before we were able to push the metadata/UUID back to the API server. It is not idempotent and if you just re-run again, you get resources recreated and a big mess to clean up. Per last arch call we have to cleanup any extraneous resources for a cluster before we launch an install again. If we can provide the UUID, we no longer have a situation where the install fails before we can get the UUID, or the UUID upload itself fails, we always know what we need to clean up and can retry reliably. However this is not required if we can tag everything with the kube cluster ID.

We have concerns about the uniqueness scope of the cluster name. I'm ok pushing kubernetes.io/cluster/UUID everywhere, but @joelddiaz has concerns about debugging with human-opaque UUIDs. Maybe not so much that it blocks that approach though ;). And if folks want to just assume that cluster names are unique enough, I'm ok giving that a shot too. However, without something like #465, we're still missing a way to get expirationDate onto cluster resources, which is our current backstop for the installer failing to fully remove the cluster (openshift/release#1761).

I think we should be safe to work on the assumption that cluster names are unique enough within one AWS account. We're going to have some safeguards to prevent problems in Hive, and it probably would be possible to have the installer try to prevent this as well by looking for similarly tagged resources in the account.

@wking
Copy link
Member

wking commented Oct 17, 2018

If we can provide the UUID, we no longer have a situation where the install fails before we can get the UUID, or the UUID upload itself fails, we always know what we need to clean up and can retry reliably.

Depending on how robust you need to be, there are at least two ways to address this:

  • (Easy, brittle) Have the installer write to a directory you're going to keep (like this). Since pkg/asset/cluster: Pull metadata into the Cluster asset #454 we write the metadata.json you need to clean up even when the Terraform install fails. The hole here would be if the container completely crashed before the installer wrote metadata.json.

  • (Moderate, robust) Use install-config to spit out the InstallConfig with the cluster ID:

    $ openshift-install --dir wking install-config
    $ grep clusterID wking/install-config.yml 
    clusterID: 0410ff22-437d-4dbc-960b-0ffa90cfeb46

    Save that cluster ID somewhere. Then run the cluster subcommand to launch the cluster. Since Save assets in a state file on disk #388, the installer will pick back up where it left off, reusing it's previously-generated cluster ID (and an other previously-created resources) when it creates the cluster.

We're going to have some safeguards to prevent problems in Hive, and it probably would be possible to have the installer try to prevent this as well by looking for similarly tagged resources in the account.

Looking for an existing cluster with a given name, not finding it, and then creating a cluster with that name is going to be racy. Is there a way to lock an AWS account against cluster creation to prevent two folks from trying to simultaneously create a.example.com and a.example.org in the same account? I'd much rather stick to UUIDs for this sort of thing, where the likelihood of collision is vanishingly small. Or use names and just tell users that they get to police collisions on their own and keep the pieces if they collide.

@wking
Copy link
Member

wking commented Oct 17, 2018

Earlier today, @abhinavdahiya was asking for specifics on whether the destroy leakage caused by missing tags was happening now. @ironcladlou says they're seeing it in CI for both openshift/cluster-ingress-operator#52 and openshift/cluster-dns-operator#38.

@abhinavdahiya
Copy link
Contributor

If we want to use kubernetes.io/cluster/<cluster-name>, and still solve some of the @wking 's concern.

We can add sematics like the cluster-name is <human-readable>-<random 5 chars> like pods for a replica set.

We can program the dns names and other user oriented stuff use only the human-readable part?

@dgoodwin
Copy link
Contributor

dgoodwin commented Oct 18, 2018

Provided it is configured in the cluster provider so all post-install LBs PVs etc get tagged with this as their cluster ID, clustername-random would be pretty good from my POV. It's possible it will end up clustername-random-random as GenerateName may be used for some clusters created in Hive, we're not sure if Dedicated will need to use this yet or not, but either way it should be fine.

This would give us the one cluster ID that can be used to fully deprovision.

Is the tectonicClusterUUID necessary anymore with this in play?

@wking
Copy link
Member

wking commented Oct 18, 2018

We can add sematics like the cluster-name is <human-readable>-<random 5 chars> like pods for a replica set.

That could work, although 5 characters is not that big a space ;). What about <human-name>-<uuid> as the cluster ID? Although that would make the clustername-random-random that @dgoodwin points out pretty long ;).

As an example of name-uniqueness collisions in real life, @ironcladlou and @sallyom have both hit:

DEBUG error getting tags for bucket %!s(*string=0xc4207d0268): BucketRegionError: incorrect region, the bucket is not in 'us-east-2' region
       status code: 301, request id: , host id: , skipping...    

when switching regions within a single AWS account (and presumably not updating their cluster name). @sallyom said:

... it takes some time to clean up all remnants of a bucket in a region, so if you wait and try again, that error will clear (I'm not sure how long, you can google and see more info, it's a known issue)...

Is the tectonicClusterUUID necessary anymore with this in play?

kubernetes.io/cluster/ is an AWS-ism. Do other platforms need a cluster-resource tag? If so, should we start using kubernetes.io/cluster/ on them as well?

@wking
Copy link
Member

wking commented Oct 18, 2018

For the S3 issue specifically, I think we can just drop the bucket name. But it's an example of the same cluster name being used for two regions in the same account.

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Oct 18, 2018

As an example of name-uniqueness collisions in real life, @ironcladlou and @sallyom have both hit:

It is not cluster name collision, see the commit for details openshift/hive@32cde50

That could work, although 5 characters is not that big a space ;).

it will be fine :P we want enough uniqueness that single user does not collide.

@joelddiaz
Copy link
Author

joelddiaz commented Oct 18, 2018

What has been decided on this? Do I update the uninstaller to key off of kubernetes.io/cluster, or is there another approach we should be taking with the uninstaller that can capture resources created inside and outside of the installer?

@dgoodwin
Copy link
Contributor

Agreed appending a 5 char suffix should be more than sufficient for colliding cluster names in one account, which you shouldn't be doing at all. Can we move forward with this @wking ?

If so, can we pass in the value or do you want to generate it? Will it appear in the metadata file?

Should it fully replace tectonic UUID that is there today?

@wking
Copy link
Member

wking commented Oct 18, 2018

After discussing today with @abhinavdahiya, we think a reasonable approach would be to update the hive package to remove resources matching either the kubernetes.io/cluster/... tag or the tectonicClusterID tag. We're ok punting randomized name suffixes to future work. And we're ok punting "how to apply other user-specified tags" (#465) to potential future work. Not super elegant, but it should unstick the current deletion issues while we sort out a long-term plan.

I've started in on converting the hive code from AND-ing the tags to OR-ing the tags, and have a few minor clean-up PRs in the pipe so far. Does this sound reasonable?

@wking
Copy link
Member

wking commented Oct 19, 2018

Ok, openshift/hive#47 has a moderately-ugly (but hopefully working ;) OR implementation. Once that lands (if the approach looks sound), I can update the installer to delete resources matching either kubernetes.io/cluster/... or tectonicClusterID.

@abhinavdahiya
Copy link
Contributor

@wking what do you think about calling hive destroy twice in installer? One for tectonicclusterid, another sweep for kubernetes.io/cluster/clustername: owned

@wking
Copy link
Member

wking commented Oct 19, 2018

what do you think about calling hive destroy twice in installer? One for tectonicclusterid, another sweep for kubernetes.io/cluster/clustername: owned

They need to be simultaneous, in case a resource tagged one way blocks deletion of a resource tagged the other way. You could do that with goroutines here, but I prefer openshift/hive#47, because it allows for future optimization (the filterObjects resources could handle both AND and OR in a single goroutine).

wking added a commit to wking/openshift-installer that referenced this issue Oct 25, 2018
…deletion

We're having trouble getting our resources tagged consistently, with
elastic IPs, NAT gateways, etc. missing kubernetes.io/cluster/<name>
[1].  Meanwhile, the ingress operator will be creating elastic load
balancers that lack tectonicClusterID [2].  This change deletes
everything we can find with either tag while we sort out something
cleaner.

[1]: openshift#458 (comment)
[2]: openshift#458 (comment)
@joelddiaz
Copy link
Author

closing. this was solved by adding/changing the functionality of the uninstaller to receive a list of key=values and attempting deletes on any matching tags openshift/hive#47

@wking
Copy link
Member

wking commented Oct 25, 2018

Reopening, because I think openshift/hive#47 and #535 are more of a temporary stopgap than a permanent fix ;). We should figure out a consistent way to tag AWS resources and get back to a single approach. And we should figure out a way to set the user-specified tags on our resources too (#465 or similar).

@eparis
Copy link
Member

eparis commented Feb 19, 2019

I believe we finished making sure everything that can be tagged is tagged. So closing.

@eparis eparis closed this as completed Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants