-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoundPods should have field -- ListMeta, instead ObjectMeta. #2089
Conversation
LGTM |
We intentionally made it ObjectMeta because of annotations that might apply to the entire set of bound pods. |
There were several discussions in #1225 but I don't think it was closed whether it should be ListMeta or ObjectMeta. BoundPods is a single object with transactional semantics. Also, it would be returned as a resource from the |
If we think later we need annotations to apply the whole set of bound pods, why not introduce it to ListMeta itself, like SelfLink which exists in both ListMeta and ObjectMeta. I think Namespace, UID, and CreationTimestamp don't make sense to BoundPods. On a node, there are 4 different ways to get a BoundPod, and those BoundPods belong to different namespaces, and created at a different time. How can we aggregate them into one? I found this confusing when I expose BoundPods through Kubelet REST API for integration of cAdvisor and Heapster. |
I think the key though is that BoundPods is supposed to be an object, not a list. There are other objects which have lists of things in them - i.e. a pod has lists of containers. It just so happens that this object contains copies of other objects. So it might quack like a list but it doesn't look like a list. ----- Original Message -----
|
I missed that discussion--was ignoring the v1beta3 centithread due to too many cooks. IMO: if annotations is the only reason, we can add annotations to ListMeta. BoundPods may be an object, but it is also a list. If we explicitly don't want it to be a list, we should not give it an "Items" member to make that clear-- call it "pods" or something. But I don't see the harm in making it a list. Maybe it's a transactional thing, whatever-- but it has the semantics of a standing query from kubelet's point of view. |
Dawn's points about the namespace and joining from multiple config sources are good. The problem is that we are conflating different schemas/APIs:
One reason we split Kubelet shouldn't be returning Should Kubelet even be returning |
@bgrant0607 Why Kubelet ever return BoundPods? Heapster and cAdvisor needs to know what are supposed to run on a given node. It could retrieve the information from apiserver directly, but without cAdvisor and other prescheduled daemon information. Also Pod Cache doesn't carry "real" time data, sometimes the data are out-of-sync pretty bad. And when the cluster is getting larger, the issue could be worse here. The current approach I suggested to them is querying apiserver to have minion list, then query Kubelet about BoundPods and ContainerStatus. Those two set of data should be sync. I am ok to create a separate BoundPodList for Kubelet's API. |
I'm not arguing that Kubelet shouldn't have an API that allows this information to be returned, only that we shouldn't overload API types intended for a different purpose. |
So if we had: APIServer: /boundpods/ -> BoundPods (object) Down the road: /boundpods?label= -> BoundPodsList (list) Scheduler: writes BoundPods to etcd Kubelet /... -> BoundPodList (list) Does that make sense? ----- Original Message -----
|
I like this. If no one objects, I will introduce the new type. |
I'm very confused about the difference between these. What is it supposed to be? |
@smarterclayton @dchen1107 Yes, exactly. @lavalamp Speculation about requesting BoundPods from multiple nodes. |
IMO, it's very confusing to have both BoundPods and BoundPodList. I don't think it's a good idea to split this right now (YAGNI). Please add really good comments if you do add this split. |
Subsequent thought: When kubelet makes an event about a container in a pod, IMO it should make an event about a BoundPod, with a fieldPath locating the container. If BoundPods is not a list and is instead an object, then kubelet instead has to make an event about a BoundPods object, with a fieldPath first locating a particular BoundPod and then a container within that. This seems way less useful. So, IMO, it should be BoundPodList, and BoundPods should just not exist. IMO, the "host" is a property of the BoundPod and should not be in the BoundPods/BoundPodList. I'm really set against having both objects, and against having BoundPods as an object instead of as a list. |
How about this:
What I think people want out of a single object in etcd is something analogous to Omega's multi-scheduler-safe atomic placement. |
@bgrant0607 that plan is a lot easier to get on board with. There's some subtleties regarding updates but I think those are just details, no deal breakers. |
Doesn't that break atomicity of constraints? Storing the bound pod list or bound pods doesn't seem to be the problem - it's merely the exposure of them in a way that seems consistent. Whatever that object is that contains references to pods is still BoundPods, whatever name it is. I guess I don't see how changing the structure addresses the fact that it's still an object, whether it's references or embeddings. Dan, I'm having a hard time parsing the objection. I think you're opposed to something looking like a list but not being a list? Or having an object that looks like another list? The thing we call BoundPods has always seemed like a scheduled unit. The fact that a scheduled unit can look like a list because it composes multiple other things seems fundamental. Can you help me get to where you're coming from here?
|
Mainly, I need individual boundPod objects to be their own thing apart from a boundPodList. I do additionally think it's super confusing to break our object/list pattern, but that's only a consistency issue, not a correctness issue. List o' thoughts:
Counterarguments:
|
----- Original Message -----
Seeing the bound pod at a REST path could be valuable, but I would argue it's not the most important modeling for bound pods (described in #4 below). Can the self link be virtual and not have a REST path and still work in this context?
So to add to this list
I don't think a standing query correctly represents operation 4 - it's the AGO + the pods at a consistent state. Down the road, I could foresee wanting to see incremental deltas be possible.... but I'm also extremely hesitant to say that's important in a 1.0 release roadmap. I'd be willing to do the simple thing for clients (GET / WATCH a single resource) and then add requirements. Having GET return a list is certainly possible, but I much prefer that the GET / WATCH resource be a real object. Also, I suspect you might want to watch for all scheduling changes to a subset of nodes by label query, and returning a list of lists would be confusing to an end user.
Should the annotations represent a snapshot of the minion state available atomically for the client? If so, then there seems to be a virtual resource that is being described to clients. If not, you cannot watch or know for sure what your desired state from the server is. I don't think annotations on bound pod list are a deal breaker right now though.
Agreed, I didn't fully go down this path last night. |
Yeah, I think a fake self link is enough to unblock me temporarily. I think we also need to keep each boundPod's ResourceVersion matching the ResourceVersion of the pod it's sourced from. (Since we're not using etcd in a way that it can do this for us for this resource.) Otherwise, you won't be able to tell what version of a pod an event is about.
Yeah, another argument for storing host with the boundPod instead of the list. Can we agree on these points?
We should probably eventually circle back and reconcile Bindings & boundPods, I suspect we can get rid of one of those concepts. |
Can one of the admins verify this patch? |
OCPBUGS-42167: Bump k8s api to 1.30.5
…87095 OCPBUGS-42640: Revert kubernetes#2089 "OCPBUGS-42167: Bump k8s api to 1.30.5"
…s api to 1.30.5""
Fix #2088.