Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make API list operations consistent #2740

Closed
bgrant0607 opened this issue Dec 3, 2014 · 13 comments
Closed

Make API list operations consistent #2740

bgrant0607 opened this issue Dec 3, 2014 · 13 comments
Labels
area/api Indicates an issue on api area. area/usability priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@bgrant0607
Copy link
Member

Currently, registry-based API list calls return full objects, but other API list calls, such as on /operations, don't:

{
  "kind": "ServerOpList",
  "creationTimestamp": null,
  "apiVersion": "v1beta1",
  "items": [
    {
      "id": "15",
      "creationTimestamp": null
    },
    {
      "id": "16",
      "creationTimestamp": null
    },
    {
      "id": "17",
      "creationTimestamp": null
    },
  ]
}

All API list calls should have consistent behavior, one way or the other.

We could potentially support both behaviors, selected by URL parameter.

Probably we can only make this change for v1beta3 (#1519).

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Dec 3, 2014
@bgrant0607 bgrant0607 self-assigned this Dec 3, 2014
@bgrant0607
Copy link
Member Author

Other List inconsistencies were pointed out in #2602.

@bgrant0607
Copy link
Member Author

Proposal: By default, list operations should not return the full object bodies, but full object bodies should be available by request, such as &recursive=true, or perhaps &children=true.

How list reports the children should be made consistent with other instances where we provide object references (#1490).

/cc @smarterclayton

@smarterclayton
Copy link
Contributor

Interesting - I can't think of a case when I wouldn't (as a client author) want children=true. I could certainly see the value of children=false in some cases.

----- Original Message -----

Proposal: By default, list operations should not return the full object
bodies, but full object bodies should be available by request, such as
&recursive=true, or perhaps &children=true.

How list reports the children should be made consistent with other instances
where we provide object references (#1490).

/cc @smarterclayton


Reply to this email directly or view it on GitHub:
#2740 (comment)

@derekwaynecarr
Copy link
Member

Internal usage scenarios seem to all desire the full content, but I could see a kubectl get command only wanting a subset. Maybe we can just provide a field mask of some kind as optional?

Sent from my iPhone

On Jan 12, 2015, at 4:14 PM, Clayton Coleman notifications@github.com wrote:

Interesting - I can't think of a case when I wouldn't (as a client author) want children=true. I could certainly see the value of children=false in some cases.

----- Original Message -----

Proposal: By default, list operations should not return the full object
bodies, but full object bodies should be available by request, such as
&recursive=true, or perhaps &children=true.

How list reports the children should be made consistent with other instances
where we provide object references (#1490).

/cc @smarterclayton


Reply to this email directly or view it on GitHub:
#2740 (comment)

Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member Author

Filtering by fields (#1459) is certainly an option (and would be useful, regardless), but my experience is that most clients/users don't bother explicitly filtering out stuff they don't need, which will waste a lot of resources in the apiserver. Maybe we can build some selection smarts into kubectl.

Returning the full bodies by default also seems inconsistent from an object-discovery point of view, and also seems problematic for scalability. The latter could perhaps be partly addressed by pagination (#2349) and not guaranteeing that all objects would be at the same resourceVersion.

@erictune
Copy link
Member

from an access control standpoint, it makes more sense to never return full
content, just names and other metadata. Everyone is used to this model:

  • ls and cat are different commands.
  • permission to list directories can be granted separate from permission
    to view file contents

In terms of writing policy, it will be cleaner to say that List is one
thing and Get is another, as opposed to List children=false is one
thing while Get and List children=true is another thing.

On Mon, Jan 12, 2015 at 2:59 PM, bgrant0607 notifications@github.com
wrote:

Filtering by fields (#1362
#1362) is
certainly an option (and would be useful, regardless), but my experience is
that most clients/users don't bother explicitly filtering out stuff they
don't need, which will waste a lot of resources in the apiserver. Maybe we
can build some selection smarts into kubectl.

Returning the full bodies by default also seems inconsistent from an
object-discovery point of view, and also seems problematic for scalability.
The latter could perhaps be partly addressed by pagination (#2349
#2349) and not
guaranteeing that all objects would be at the same resourceVersion.


Reply to this email directly or view it on GitHub
#2740 (comment)
.

@bgrant0607
Copy link
Member Author

Another consideration: If we expect the pattern of LIST followed by GETs of each object in the LIST to be common, we should ensure that this doesn't have quadratic behavior -- the GETs must execute in constant time. It's true today for lookups by name, but that requirement should be documented. In the case that they are constant time, one advantage of this approach is that the GETs may be issued by the client and served by the apiserver in parallel.

@liggitt
Copy link
Member

liggitt commented Jan 13, 2015

From a practical perspective, if the api requires LIST followed by a GET of every individual item, we probably wouldn't need ACL... the pain of using that api would be protection enough :)

It's fine to be able to list (or watch) without getting contents (and grant someone the ability to list/watch without getting contents), but there has to be a way to efficiently list/watch the items without spawning N additional requests.

@bgrant0607
Copy link
Member Author

If the consensus is that the full contents are almost always useful, I'm fine with that. Most requests will use our client, so we can do intelligent things there. What I think we'd need to do:

  1. Make all REST LISTs consistent in the form of data returned. For example, the versions list, resources/kinds list, operations list.
  2. Make all REST LISTs consistent in the URL parameters accepted (e.g., label selection, field selection, reverse label lookup, field filtering)
  3. Support pagination (get pods list API for large datasets, pagination in plans? #2349).
  4. Support field filtering (GET/LIST subset of object fields #1459), of metadata, spec, and status in particular.
  5. Ensure kubectl only fetches what it needs.
  6. Document the constant-time requirement.
  7. Figure out how to support LIST+WATCH in a correct and scalable way. I assume that this requires fetching the entire list at the same resourceVersion from etcd.
  8. Ensure that it's feasible to provide per-object authorization.

@smarterclayton
Copy link
Contributor

On Jan 13, 2015, at 4:24 PM, bgrant0607 notifications@github.com wrote:

If the consensus is that the full contents are almost always useful, I'm fine with that. Most requests will use our client, so we can do intelligent things there. What I think we'd need to do:

  1. Make all REST LISTs consistent in the form of data returned. For example, the versions list, resources/kinds list, operations list.
  2. Make all REST LISTs consistent in the URL parameters accepted (e.g., label selection, field selection, reverse label lookup, field filtering)
  3. Support pagination (get pods list API for large datasets, pagination in plans? #2349).

We've at least discussed this in the context of the etcd 3 store, so I'm hoping we can continue to push so that by later this year we have a story for that. In the meantime we could potentially page by namespace on our worst queries, but that does require us to be more sophisticated on our watches.
4. Support field filtering (#1459), of metadata, spec, and status in particular

  1. Ensure kubectl only fetches what it needs.

Also the controllers and api servers - this will be a significant win.

Our major disadvantage with etcd at this point in time is we can't have covering indices on the core object, so we pay the transfer and decompression time on the apiserver, with minimal gains to the client except bandwidth.

  1. Document the constant-time requirement.
  2. Figure out how to support LIST+WATCH in a correct and scalable way. I assume that this requires fetching the entire list at the same resourceVersion from etcd.
  3. Ensure that it's feasible to provide per-object authorization.


Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member Author

We should be caching all objects in memory always (for a given namespace/kind shard), in which case our etcd load would be dramatically reduced.

@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 5, 2015
@bgrant0607
Copy link
Member Author

Will leave this open as an umbrella issue.

@bgrant0607 bgrant0607 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 5, 2015
@bgrant0607 bgrant0607 removed their assignment Feb 5, 2015
@bgrant0607
Copy link
Member Author

Closing in favor of more specific issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/usability priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

5 participants