Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundPods should have field -- ListMeta, instead ObjectMeta. #2089

Closed
wants to merge 1 commit into from

Conversation

dchen1107
Copy link
Member

Fix #2088.

@lavalamp
Copy link
Member

LGTM

@smarterclayton
Copy link
Contributor

We intentionally made it ObjectMeta because of annotations that might apply to the entire set of bound pods.

@smarterclayton
Copy link
Contributor

There were several discussions in #1225 but I don't think it was closed whether it should be ListMeta or ObjectMeta. BoundPods is a single object with transactional semantics. Also, it would be returned as a resource from the /boundpods/<host> API to allow nodes to watch for changes to their boundpods.

@dchen1107
Copy link
Member Author

If we think later we need annotations to apply the whole set of bound pods, why not introduce it to ListMeta itself, like SelfLink which exists in both ListMeta and ObjectMeta. I think Namespace, UID, and CreationTimestamp don't make sense to BoundPods. On a node, there are 4 different ways to get a BoundPod, and those BoundPods belong to different namespaces, and created at a different time. How can we aggregate them into one?

I found this confusing when I expose BoundPods through Kubelet REST API for integration of cAdvisor and Heapster.

@smarterclayton
Copy link
Contributor

I think the key though is that BoundPods is supposed to be an object, not a list. There are other objects which have lists of things in them - i.e. a pod has lists of containers. It just so happens that this object contains copies of other objects. So it might quack like a list but it doesn't look like a list.

----- Original Message -----

If we think later we need annotations to apply the whole set of bound pods,
why not introduce it to ListMeta itself, like SelfLink which exists in both
ListMeta and ObjectMeta. I think Namespace, UID, and CreationTimestamp don't
make sense to BoundPods. On a node, there are 4 different ways to get a
BoundPod, and those BoundPods belong to different namespaces, and created at
a different time. How can we aggregate them into one?

I found this confusing when I expose BoundPods through Kubelet REST API for
integration of cAdvisor and Heapster.


Reply to this email directly or view it on GitHub:
#2089 (comment)

@lavalamp
Copy link
Member

I missed that discussion--was ignoring the v1beta3 centithread due to too many cooks. IMO: if annotations is the only reason, we can add annotations to ListMeta. BoundPods may be an object, but it is also a list.

If we explicitly don't want it to be a list, we should not give it an "Items" member to make that clear-- call it "pods" or something.

But I don't see the harm in making it a list. Maybe it's a transactional thing, whatever-- but it has the semantics of a standing query from kubelet's point of view.

@bgrant0607
Copy link
Member

Dawn's points about the namespace and joining from multiple config sources are good.

The problem is that we are conflating different schemas/APIs:

  • etcd object schema, for changing the set of bound pods atomically
  • apiserver API for Kubelets to watch their own bound pods, and to know the full set at a point in time
  • Kubelet API for serving information about the pods it is managing

One reason we split BoundPod from Pod is so that the information could diverge eventually. I think we need to apply the same reasoning here.

Kubelet shouldn't be returning BoundPods. We should create a new List type (e.g., BoundPodList) for Kubelet's API.

Should Kubelet even be returning BoundPod? I'd like Kubelet to be able to return Status for the pods, which may be targeted at schedulers and the minion/node controller in addition to providing information of interest to users. Are we ok with adding that to BoundPod, in keeping with the pattern of including both spec and status in all the objects?

@dchen1107
Copy link
Member Author

@bgrant0607 Why Kubelet ever return BoundPods? Heapster and cAdvisor needs to know what are supposed to run on a given node. It could retrieve the information from apiserver directly, but without cAdvisor and other prescheduled daemon information. Also Pod Cache doesn't carry "real" time data, sometimes the data are out-of-sync pretty bad. And when the cluster is getting larger, the issue could be worse here. The current approach I suggested to them is querying apiserver to have minion list, then query Kubelet about BoundPods and ContainerStatus. Those two set of data should be sync.

I am ok to create a separate BoundPodList for Kubelet's API.

@bgrant0607
Copy link
Member

I'm not arguing that Kubelet shouldn't have an API that allows this information to be returned, only that we shouldn't overload API types intended for a different purpose.

@smarterclayton
Copy link
Contributor

So if we had:

APIServer:

/boundpods/ -> BoundPods (object)

Down the road:

/boundpods?label= -> BoundPodsList (list)

Scheduler:

writes BoundPods to etcd

Kubelet

/... -> BoundPodList (list)

Does that make sense?

----- Original Message -----

I'm not arguing that Kubelet shouldn't have an API that allows this
information to be returned, only that we shouldn't overload API types
intended for a different purpose.


Reply to this email directly or view it on GitHub:
#2089 (comment)

@dchen1107
Copy link
Member Author

I like this. If no one objects, I will introduce the new type.

@lavalamp
Copy link
Member

APIServer:
/boundpods/ -> BoundPods (object)
Down the road:
/boundpods?label= -> BoundPodsList (list)

I'm very confused about the difference between these. What is it supposed to be?

@bgrant0607
Copy link
Member

@smarterclayton @dchen1107 Yes, exactly.

@lavalamp Speculation about requesting BoundPods from multiple nodes.

@lavalamp
Copy link
Member

IMO, it's very confusing to have both BoundPods and BoundPodList. I don't think it's a good idea to split this right now (YAGNI). Please add really good comments if you do add this split.

@lavalamp
Copy link
Member

Subsequent thought:

When kubelet makes an event about a container in a pod, IMO it should make an event about a BoundPod, with a fieldPath locating the container.

If BoundPods is not a list and is instead an object, then kubelet instead has to make an event about a BoundPods object, with a fieldPath first locating a particular BoundPod and then a container within that. This seems way less useful.

So, IMO, it should be BoundPodList, and BoundPods should just not exist. IMO, the "host" is a property of the BoundPod and should not be in the BoundPods/BoundPodList. I'm really set against having both objects, and against having BoundPods as an object instead of as a list.

@bgrant0607
Copy link
Member

How about this:

  1. In etcd, we have an object for each node that just lists references to pods assigned to that node, but not their full details, called something other than BoundPods
  2. apiserver provides an endpoint that does the join between that list and the corresponding pod specs, returning a BoundPodList that Kubelet can GET/WATCH
  3. kubelet serves a BoundPodList, which includes everything running on that node (including from file, etc.), and including status

What I think people want out of a single object in etcd is something analogous to Omega's multi-scheduler-safe atomic placement.

@lavalamp
Copy link
Member

lavalamp commented Nov 1, 2014

@bgrant0607 that plan is a lot easier to get on board with. There's some subtleties regarding updates but I think those are just details, no deal breakers.

@smarterclayton
Copy link
Contributor

Doesn't that break atomicity of constraints? Storing the bound pod list or bound pods doesn't seem to be the problem - it's merely the exposure of them in a way that seems consistent. Whatever that object is that contains references to pods is still BoundPods, whatever name it is. I guess I don't see how changing the structure addresses the fact that it's still an object, whether it's references or embeddings.

Dan, I'm having a hard time parsing the objection. I think you're opposed to something looking like a list but not being a list? Or having an object that looks like another list?

The thing we call BoundPods has always seemed like a scheduled unit. The fact that a scheduled unit can look like a list because it composes multiple other things seems fundamental. Can you help me get to where you're coming from here?

On Oct 31, 2014, at 8:20 PM, Daniel Smith notifications@github.com wrote:

@bgrant0607 that plan is a lot easier to get on board with. There's some subtleties regarding updates but I think those are just details, no deal breakers.


Reply to this email directly or view it on GitHub.

@lavalamp
Copy link
Member

lavalamp commented Nov 3, 2014

Mainly, I need individual boundPod objects to be their own thing apart from a boundPodList. I do additionally think it's super confusing to break our object/list pattern, but that's only a consistency issue, not a correctness issue.

List o' thoughts:

  1. Need to make an event referring to a particular boundPod, not to entire group of bound pods.
  2. Implies need for each boundPod to have a valid self link, and to exist at some REST path.
  3. IMO, "host" is a field that belongs in each boundPod, not in the group, even if it's a bit redundant-- because you should be able to have a pointer to a boundPod without having the entire list.

Counterarguments:

  1. Need to put annotations/labels on boundPod list.

    a. Why? Put them on the minion, not the boundPodList.

  2. Atomicity guarantee.

    a. There's no reason why we can't, under the hood in etcd, maintain the entire group of bound pods in a list like we do currently.
    b. Alternatively, the entries we store in the atomicity-guaranteeing object (AGO?) can be references to boundPod UID+ResourceVersion. Actually--should probably still cache the objects in the AGO, since etcd doesn't promise we can read an old version indefinitely. So I guess this is the same as 'a'.

@smarterclayton
Copy link
Contributor

----- Original Message -----

Mainly, I need individual boundPod objects to be their own thing apart from a
boundPodList. I do additionally think it's super confusing to break our
object/list pattern, but that's only a consistency issue, not a correctness
issue.

List o' thoughts:

  1. Need to make an event referring to a particular boundPod, not to entire
    group of bound pods.
  2. Implies need for each boundPod to have a valid self link, and to exist at
    some REST path.

Seeing the bound pod at a REST path could be valuable, but I would argue it's not the most important modeling for bound pods (described in #4 below). Can the self link be virtual and not have a REST path and still work in this context?

  1. IMO, "host" is a field that belongs in each boundPod, not in the group,
    even if it's a bit redundant-- because you should be able to have a pointer
    to a boundPod without having the entire list.

So to add to this list

  1. Kubelet / external integrator needs to be able to logically request the bound pods for a host and watch on them (at a minimum, the watch should be on an atomic resource)

I don't think a standing query correctly represents operation 4 - it's the AGO + the pods at a consistent state. Down the road, I could foresee wanting to see incremental deltas be possible.... but I'm also extremely hesitant to say that's important in a 1.0 release roadmap. I'd be willing to do the simple thing for clients (GET / WATCH a single resource) and then add requirements. Having GET return a list is certainly possible, but I much prefer that the GET / WATCH resource be a real object.

Also, I suspect you might want to watch for all scheduling changes to a subset of nodes by label query, and returning a list of lists would be confusing to an end user.

Counterarguments:

  1. Need to put annotations/labels on boundPod list.
    a. Why? Put them on the minion, not the boundPodList.

Should the annotations represent a snapshot of the minion state available atomically for the client? If so, then there seems to be a virtual resource that is being described to clients. If not, you cannot watch or know for sure what your desired state from the server is. I don't think annotations on bound pod list are a deal breaker right now though.

  1. Atomicity guarantee.
    a. There's no reason why we can't, under the hood in etcd, maintain the
    entire group of bound pods in a list like we do currently.

    b. Alternatively, the entries we store in the atomicity-guaranteeing object
    (AGO?) can be references to boundPod UID+ResourceVersion. Actually--should
    probably still cache the objects in the AGO, since etcd doesn't promise we
    can read an old version indefinitely. So I guess this is the same as 'a'.

Agreed, I didn't fully go down this path last night.

@lavalamp
Copy link
Member

lavalamp commented Nov 3, 2014

Can the self link be virtual and not have a REST path and still work in this context?

Yeah, I think a fake self link is enough to unblock me temporarily.

I think we also need to keep each boundPod's ResourceVersion matching the ResourceVersion of the pod it's sourced from. (Since we're not using etcd in a way that it can do this for us for this resource.) Otherwise, you won't be able to tell what version of a pod an event is about.

Also, I suspect you might want to watch for all scheduling changes to a subset of nodes by label query, and returning a list of lists would be confusing to an end user.

Yeah, another argument for storing host with the boundPod instead of the list.

Can we agree on these points?

  1. boundPod ought to (eventually) behave like a real object from the perspective of REST clients. It should at least answer to GET requests, possibly DELETEs. ResourceVersion & SelfLink should be set appropriately.
  2. boundPodList ought to maintain the current atomicity guarantee, for example by storing a cached copy of each boundPod of the correct version in the list.
  3. It can be decided later whether the source of truth for a boundPod is its entry in the boundPodList stored in etcd, or if it has its own etcd object with caches stored in the boundPodList. This is an implementation detail and shouldn't affect the REST interface.

We should probably eventually circle back and reconcile Bindings & boundPods, I suspect we can get rid of one of those concepts.

@kubernetes-bot
Copy link

Can one of the admins verify this patch?

@bgrant0607
Copy link
Member

This is completely obsolete now, superseded by #2483, #2794, and related issues.

@bgrant0607 bgrant0607 closed this Dec 17, 2014
@dchen1107 dchen1107 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 4, 2015
dgoodwin pushed a commit to dgoodwin/kubernetes that referenced this pull request Oct 1, 2024
OCPBUGS-42167: Bump k8s api to 1.30.5
dgoodwin added a commit to dgoodwin/kubernetes that referenced this pull request Oct 1, 2024
bertinatto pushed a commit to bertinatto/kubernetes that referenced this pull request Oct 14, 2024
…87095

OCPBUGS-42640: Revert kubernetes#2089 "OCPBUGS-42167: Bump k8s api to 1.30.5"
bertinatto added a commit to bertinatto/kubernetes that referenced this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/kubelet-api sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BoundPods should have field -- ListMeta, instead ObjectMeta
5 participants