Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our API code is hard to understand #2306

Closed
bgrant0607 opened this issue Nov 11, 2014 · 32 comments
Closed

Our API code is hard to understand #2306

bgrant0607 opened this issue Nov 11, 2014 · 32 comments
Labels
area/api Indicates an issue on api area. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.

Comments

@bgrant0607
Copy link
Member

I've been digging through our API code in order to automatically document the API, and it's hard to understand.

The division between master and apiserver isn't clean, and the API machinery is spread out between (at least) api, runtime, conversion, version, and registry, with a seemingly arbitrary decomposition into files and a bewildering proliferation of interfaces (storage, codec, scheme, ...). Additionally, it's not obvious which code is used by the server, by clients, or both. Also, ideally, the generic machinery would be cleanly separated from Kubernetes's specific API object implementations.

After we get some API plugins working, I suggest we look hard at how to simplify and reorganize this.

/cc @lavalamp @smarterclayton

@bgrant0607 bgrant0607 added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/api Indicates an issue on api area. labels Nov 11, 2014
@goltermann goltermann added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 3, 2014
@bgrant0607
Copy link
Member Author

For example, why does api/conversion.go exist?

@yujuhong
Copy link
Contributor

Isn't api/conversion.go there to add/register the conversion functions?

In general, I agree the API code is hard to understand. There are too many interfaces, and I am not sure they are all that necessary. A little background and/or higher-level overview on the API code would definitely help a newbie like me.

@bgrant0607
Copy link
Member Author

@yujuhong Which conversion functions? vs. v1betaN/conversion.go files?

@smarterclayton
Copy link
Contributor

Originally it was to facilitate containermanifest -> podspec, since kubelet continued to support legacy paths.

On Jan 28, 2015, at 5:12 PM, Brian Grant notifications@github.com wrote:

@yujuhong Which conversion functions? vs. v1betaN/conversion.go files?


Reply to this email directly or view it on GitHub.

@yujuhong
Copy link
Contributor

It seems like a temporary place to hold conversion functions. All of them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

@smarterclayton
Copy link
Contributor

There will always be internal objects you want to convert across, although it should be a few (due to reuse of common structs.

On Jan 28, 2015, at 6:35 PM, Yu-Ju Hong notifications@github.com wrote:

It seems like a temporary place to hold conversion functions. All of them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.


Reply to this email directly or view it on GitHub.

@thockin
Copy link
Member

thockin commented Jan 29, 2015

It also hosts "copy constructors" that have been externalized from objects
because we have resisted making methods on objects.

If we were not allergic to methods we could make the same-type conversion
functions be methods, as well as the semantic-equal functions. And then we
could talk about defaults and validation as methods.

On Wed, Jan 28, 2015 at 3:47 PM, Clayton Coleman notifications@github.com
wrote:

There will always be internal objects you want to convert across, although
it should be a few (due to reuse of common structs.

On Jan 28, 2015, at 6:35 PM, Yu-Ju Hong notifications@github.com
wrote:

It seems like a temporary place to hold conversion functions. All of
them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#2306 (comment)
.

@smarterclayton
Copy link
Contributor

aachoo

----- Original Message -----

It also hosts "copy constructors" that have been externalized from objects
because we have resisted making methods on objects.

If we were not allergic to methods we could make the same-type conversion
functions be methods, as well as the semantic-equal functions. And then we
could talk about defaults and validation as methods.

On Wed, Jan 28, 2015 at 3:47 PM, Clayton Coleman notifications@github.com
wrote:

There will always be internal objects you want to convert across, although
it should be a few (due to reuse of common structs.

On Jan 28, 2015, at 6:35 PM, Yu-Ju Hong notifications@github.com
wrote:

It seems like a temporary place to hold conversion functions. All of
them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#2306 (comment)
.


Reply to this email directly or view it on GitHub:
#2306 (comment)

@bgrant0607
Copy link
Member Author

@smarterclayton I'm increasingly concerned about the API code complexity. Could you please sketch what API layers/modules we have in the client and server, what they do, and why they exist? I very much want to preserve extensibility and testability, but would like to find a way to reduce the number of levels of indirection, or at least rationalize and streamline the organization.

@smarterclayton
Copy link
Contributor

Yes agree.

----- Original Message -----

@smarterclayton I'm increasingly concerned about the API code complexity.
Could you please sketch what API layers/modules we have in the client and
server, what they do, and why they exist? I very much want to preserve
extensibility and testability, but would like to find a way to reduce the
number of levels of indirection, or at least rationalize and streamline the
organization.


Reply to this email directly or view it on GitHub:
#2306 (comment)

@davidopp
Copy link
Member

davidopp commented Feb 3, 2015

Some new people joining the team have also mentioned the API code complexity. When I joined the team a couple of months ago I also found it baffling (and largely still do, even though I wrote a fairly large CL that modified it). At the very least we need to document what we have (both in the code and in a separate design doc). It would also be nice if we could simplify it.

@bgrant0607
Copy link
Member Author

What do I include in "API code"?

Really, there are 3 overlapping sets of code:

  1. API serving infrastructure: master, apiserver, registry, api
  2. API decoding/encoding/conversion: api, conversion, runtime
  3. Components implementing the API objects: controllers, schedulers, kubelet, etc.

@davidopp
Copy link
Member

davidopp commented Feb 3, 2015

FWIW I think the badness situation is basically 2 is the worst, then 1, then 3.

@smarterclayton
Copy link
Contributor

Can you define badness in terms of actionable cleanup for 2?

----- Original Message -----

FWIW I think the badness situation is basically 2 is the worst, then 1, then
3.


Reply to this email directly or view it on GitHub:
#2306 (comment)

@bgrant0607
Copy link
Member Author

The actionable cleanup for (1) and (3) are most clear to me.

As discussed in #1451 (1) and #2726 (3), the apiserver should be entirely generic. Status should be pushed into etcd via calls to apiserver by controllers and not computed on the fly by apiserver. Validation and defaulting code should live with the registry objects. The individual registry objects shouldn't have any custom etcd code.

Mainly what I wanted to be the topic of this bug was (2), and I don't yet understand that code well enough to prescribe a solution.

@smarterclayton
Copy link
Contributor

I think we have very clear paths for 1 that can be described today (based on iterative steps Dan and I had been taking). 3. Definitely can be simplified to fit more closely to helpers around gorestful, but that's been mostly small changes made at each point, and as you say it's in the best shape.

For 2, I can describe a fair amount of what is there now in the context of trying to tie external versions to go structs. The Scheme is a bit of a catch all object that could be more cleanly factored into a smaller object. Encoding and decoding are above the scheme, because they can use the go structs directly, except for some coupling to the object structure themselves which we can work around. Some of the complexity in the code is an attempt to be moderately efficient (although encode/decode/conversion are anything but today), which has a lot of low hanging fruit.

I'll try to draft up something on that specifically soon (and #1, because from where we are today to where we want to be should be moderately mechanical refactoring).

On Feb 3, 2015, at 7:25 PM, Brian Grant notifications@github.com wrote:

The actionable cleanup for (1) and (3) are most clear to me.

As discussed in #1451 (1) and #2726 (3), the apiserver should be entirely generic. Status should be pushed into etcd via calls to apiserver by controllers and not computed on the fly by apiserver. Validation and defaulting code should live with the registry objects. The individual registry objects shouldn't have any custom etcd code.

Mainly what I wanted to be the topic of this bug was (2), and I don't yet understand that code well enough to prescribe a solution.


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

Next steps for simplifying RESTStorage, registry, generic registry, and etcd behavior (1 above):

  • Delete, Get, List, Watch are all relatively simple
  • Create and Update are the two complex REST behaviors that need to follow strict rules based on our agreed schema
    • UID, creation timestamp need to be filled
    • Name is either provided or Generated
    • Validating a create is not the same as validating an update
    • Some fields are ignored on create (today)
    • Some fields should be ignored on update - silently merged (today)
    • The fields that are ignored or automatically merged are resource specific, and in the future we will allow automated systems to define those
  • The general storage pattern for our existing resources is similar, with a few resources like pods that attempt to maintain two objects without a transaction (pod and the underlying bound pod, for constraint checking)
    • In the future, there will be limited transactional operations in etcd, and most other reasonable KV stores we would consider already have multi-put (consul, zookeeper, all SQL databases, cockroachdb, a few other), therefore transactional store guarantees MUST remain in the registry layer, but the etcd registry currently can leave things lying around that either need to be cleaned up or read repaired
    • In the time period between now and when etcd has multi-put, we may decide we need an "etcd transactional cleanup controller" that periodically cleans up mismatched references
  • Some objects, during update, need to non-transactionally check the state of the store to validate other objects (to look for overlaps between RCs, for example)
    • This is also a responsibility of the registry, because transactional guarantees would make those checks more effective and the registry layer can hide those checks
  • Having one etcd registry for all our resources (pkg/registry/etcd/etcd.go) is not necessary and potentially encourages people to couple resources of different types together, so they should be split

The current path implementation:

  1. Implement a generic etcd registry object that takes generic API "resources", performs the necessary generic API transformations, has hooks for per-resource behavior, and in general actively discourages deviating from the "API pattern" (mostly done)
  2. Implement all per resource API logic like merging for updates in a way that is not tied to a particular backend (pkg/api/rest)
  3. Each RESTStorage implementation is a thin wrapper that combines 1 and 2 to expose the supported methods for a resource.
  4. The generic test suite on top of 2 can be reused for all resource types, with special cases per resource (see pkg/api/rest/resttest)
  5. Each resource RESTStorage impl gets its own package pkg/pods/etcd/etcd.go and the generic behavior should probably go in pkg/pods or pkg/pods/rest

Example:

// Implements RESTStorage for pods on top of etcd
type PodEtcd struct {
  registry GenericEtcd
}

func NewPodEtcd() RESTStorage {
  return &GenericEtcd{rest.Pods} // the "api logic" for pods is defined in a way that matches an interface
}

func (r *PodEtcd) Update(obj runtime.Object) (chan Result, error) {
  return registry.Update(obj)
}

It's assumed that writing stores to new backends is relatively uncommon, and that the test suite would be easily adapted to new storage types. So we preserve the behavior that PodEtcd implements RESTStorage, and that the apiserver can check whether RESTStorage implements a method via a cast to an interface.

This makes PodEtcd a very thin layer around generic etcd, and someone can write GenericPostgreSQL or GenericConsul in the future.

Remaining todos:

  • Remove async channel on interfaces
  • Figure out how metadata about the API can come from RESTStorage to go-restful to swagger
    • The more of this we do, the less apiserver is anything other than a set of utility functions for transforming rest storage into go-restful, which is what we want
  • Need to determine how we model updating status for both end users (who are likely to not want to clobber status when they PUT) and for system users - probably via a flag / header / query param / new rest resource that imposes different behavior.

@bgrant0607
Copy link
Member Author

Thanks for the writeup @smarterclayton. Will try to digest and reconcile with previous discussion in #1451.

I don't yet understand the division between pkg/conversion and pkg/runtime, which contain marshaling/unmarshaling, conversion, defaulting, some generic metadata accessors, some registry-like functionality (per-kind functionality, collections of kinds), support for extensions, and probably some other stuff.

@thockin
Copy link
Member

thockin commented Feb 8, 2015

I think I like 99% of what's written here. The part I want clarity on is how widespread knowledge of etcd needs to be? I was expecting to see something along the lines of #2594 wherein the kernel of the API server knows that etcd is how we implement the generic Storage interface. All other REST object code simply uses the generic Storage.

More succinctly - the word "etcd" should not appear in the codebase except in the implementation of the storage interface and the code that constructs the concrete instance.

@davidopp
Copy link
Member

davidopp commented Feb 8, 2015

FWIW I think this would be a great topic for the contributors meeting--describe the current partitioning of this functionality and the proposal.

@bgrant0607
Copy link
Member Author

Ok, I think I understand this now. However, I second @thockin's question about etcd.

There is currently business logic in pkg/registry/etcd/etcd.go that doesn't belong there, such as status field initialization upon create or update. It's horrible that the business logic is spread between individual registry objects (e.g., pkg/registry/pod/rest.go), etcd.go, pod_cache.go, and the responsible components (controller-manager, kubelet, etc.). And that doesn't even include the validation, conversion, defaulting, and fuzzing logic.

A primary goal has to be to consolidate object-centric mutation logic. Making the apiserver framework totally generic and the etcd layer totally generic are also important goals.

Objects shouldn't care about details like what keys are used to store them in etcd -- that should be generically derived from Kind.

We should solve field selection #1362 generically, also.

Preconditions should all be reducible to resourceVersion-based preconditions.

The BoundPods stuff will go away with BoundPods.

@smarterclayton
Copy link
Contributor

On Feb 9, 2015, at 3:12 AM, Brian Grant notifications@github.com wrote:

Ok, I think I understand this now. However, I second @thockin's question about etcd.

There is currently business logic in pkg/registry/etcd/etcd.go that doesn't belong there, such as status field initialization upon create or update. It's horrible that the business logic is spread between individual registry objects (e.g., pkg/registry/pod/rest.go), etcd.go, pod_cache.go, and the responsible components (controller-manager, kubelet, etc.). And that doesn't even include the validation, conversion, defaulting, and fuzzing logic.

A primary goal has to be to consolidate object-centric mutation logic. Making the apiserver framework totally generic and the etcd layer totally generic are also important goals.

Yes - binding etcd storage to object mutation logic will be the only place someone should see etcd (in master.go where we instantiate it, and in the actual types that do the binding). Those types should be small - a few lines per method at most.
Objects shouldn't care about details like what keys are used to store them in etcd -- that should be generically derived from Kind.

We should solve field selection #1362 generically, also.

Preconditions should all be reducible to resourceVersion-based preconditions.

The BoundPods stuff will go away with BoundPods.


Reply to this email directly or view it on GitHub.

@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 28, 2015
@bgrant0607
Copy link
Member Author

See also #1451

@smarterclayton
Copy link
Contributor

Started a proposal / summary here #7111

@lavalamp
Copy link
Member

It's still getting worse, not better, along the understandable metric.

@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Sep 6, 2016
@bgrant0607 bgrant0607 removed this from the next-candidate milestone Sep 6, 2016
@bgrant0607
Copy link
Member Author

cc @mml

@bgrant0607 bgrant0607 added kind/technical-debt sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. labels Sep 6, 2016
@smarterclayton
Copy link
Contributor

  • I think the storage and storage initialization code has metastasized a bit, as well as API initialization
  • We've got a proliferation of clients - I think we're up to 35 different clients total.
  • api/unversioned is a mess.

@smarterclayton
Copy link
Contributor

The majority of the refactors have been executed in 1.6, although more work in documentation is certainly helpful. We actually have clean separation of core packages, layering, and stronger control on what goes where.

@0xmichalis 0xmichalis added the kind/documentation Categorizes issue or PR as related to documentation. label Jun 2, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.
Projects
None yet
Development

No branches or pull requests

10 participants