Our API code is hard to understand #2306

bgrant0607 · 2014-11-11T21:42:40Z

I've been digging through our API code in order to automatically document the API, and it's hard to understand.

The division between master and apiserver isn't clean, and the API machinery is spread out between (at least) api, runtime, conversion, version, and registry, with a seemingly arbitrary decomposition into files and a bewildering proliferation of interfaces (storage, codec, scheme, ...). Additionally, it's not obvious which code is used by the server, by clients, or both. Also, ideally, the generic machinery would be cleanly separated from Kubernetes's specific API object implementations.

After we get some API plugins working, I suggest we look hard at how to simplify and reorganize this.

/cc @lavalamp @smarterclayton

bgrant0607 · 2015-01-28T21:49:44Z

For example, why does api/conversion.go exist?

yujuhong · 2015-01-28T22:00:26Z

Isn't api/conversion.go there to add/register the conversion functions?

In general, I agree the API code is hard to understand. There are too many interfaces, and I am not sure they are all that necessary. A little background and/or higher-level overview on the API code would definitely help a newbie like me.

bgrant0607 · 2015-01-28T22:12:23Z

@yujuhong Which conversion functions? vs. v1betaN/conversion.go files?

smarterclayton · 2015-01-28T23:21:07Z

Originally it was to facilitate containermanifest -> podspec, since kubelet continued to support legacy paths.

On Jan 28, 2015, at 5:12 PM, Brian Grant notifications@github.com wrote:

@yujuhong Which conversion functions? vs. v1betaN/conversion.go files?

—
Reply to this email directly or view it on GitHub.

yujuhong · 2015-01-28T23:35:12Z

It seems like a temporary place to hold conversion functions. All of them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

smarterclayton · 2015-01-28T23:47:05Z

There will always be internal objects you want to convert across, although it should be a few (due to reuse of common structs.

On Jan 28, 2015, at 6:35 PM, Yu-Ju Hong notifications@github.com wrote:

It seems like a temporary place to hold conversion functions. All of them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

—
Reply to this email directly or view it on GitHub.

thockin · 2015-01-29T08:22:03Z

It also hosts "copy constructors" that have been externalized from objects
because we have resisted making methods on objects.

If we were not allergic to methods we could make the same-type conversion
functions be methods, as well as the semantic-equal functions. And then we
could talk about defaults and validation as methods.

On Wed, Jan 28, 2015 at 3:47 PM, Clayton Coleman notifications@github.com
wrote:

There will always be internal objects you want to convert across, although
it should be a few (due to reuse of common structs.

On Jan 28, 2015, at 6:35 PM, Yu-Ju Hong notifications@github.com
wrote:

It seems like a temporary place to hold conversion functions. All of
them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#2306 (comment)
.

smarterclayton · 2015-01-29T15:14:11Z

aachoo

----- Original Message -----

It also hosts "copy constructors" that have been externalized from objects
because we have resisted making methods on objects.

If we were not allergic to methods we could make the same-type conversion
functions be methods, as well as the semantic-equal functions. And then we
could talk about defaults and validation as methods.

On Wed, Jan 28, 2015 at 3:47 PM, Clayton Coleman notifications@github.com
wrote:

There will always be internal objects you want to convert across, although
it should be a few (due to reuse of common structs.

On Jan 28, 2015, at 6:35 PM, Yu-Ju Hong notifications@github.com
wrote:

It seems like a temporary place to hold conversion functions. All of
them can and should be moved to v1beta1/v1beta2/v1beta3, etc eventually.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#2306 (comment)
.

Reply to this email directly or view it on GitHub:
#2306 (comment)

bgrant0607 · 2015-02-03T17:48:30Z

@smarterclayton I'm increasingly concerned about the API code complexity. Could you please sketch what API layers/modules we have in the client and server, what they do, and why they exist? I very much want to preserve extensibility and testability, but would like to find a way to reduce the number of levels of indirection, or at least rationalize and streamline the organization.

smarterclayton · 2015-02-03T18:31:23Z

Yes agree.

----- Original Message -----

@smarterclayton I'm increasingly concerned about the API code complexity.
Could you please sketch what API layers/modules we have in the client and
server, what they do, and why they exist? I very much want to preserve
extensibility and testability, but would like to find a way to reduce the
number of levels of indirection, or at least rationalize and streamline the
organization.

Reply to this email directly or view it on GitHub:
#2306 (comment)

davidopp · 2015-02-03T21:29:42Z

Some new people joining the team have also mentioned the API code complexity. When I joined the team a couple of months ago I also found it baffling (and largely still do, even though I wrote a fairly large CL that modified it). At the very least we need to document what we have (both in the code and in a separate design doc). It would also be nice if we could simplify it.

bgrant0607 · 2015-02-03T21:49:34Z

What do I include in "API code"?

pkg/master
pkg/apiserver
pkg/api -- except */types.go (though apiserver: re-architect storage #1451, Break up types.go #1744, and Eliminate api/conversion.go #3914 are relevant)
pkg/conversion
pkg/registry
pkg/runtime
parts of pkg/util and pkg/types

Really, there are 3 overlapping sets of code:

API serving infrastructure: master, apiserver, registry, api
API decoding/encoding/conversion: api, conversion, runtime
Components implementing the API objects: controllers, schedulers, kubelet, etc.

davidopp · 2015-02-03T23:54:41Z

FWIW I think the badness situation is basically 2 is the worst, then 1, then 3.

smarterclayton · 2015-02-03T23:56:04Z

Can you define badness in terms of actionable cleanup for 2?

----- Original Message -----

FWIW I think the badness situation is basically 2 is the worst, then 1, then
3.

Reply to this email directly or view it on GitHub:
#2306 (comment)

bgrant0607 · 2015-02-04T00:25:10Z

The actionable cleanup for (1) and (3) are most clear to me.

As discussed in #1451 (1) and #2726 (3), the apiserver should be entirely generic. Status should be pushed into etcd via calls to apiserver by controllers and not computed on the fly by apiserver. Validation and defaulting code should live with the registry objects. The individual registry objects shouldn't have any custom etcd code.

Mainly what I wanted to be the topic of this bug was (2), and I don't yet understand that code well enough to prescribe a solution.

smarterclayton · 2015-02-04T01:08:45Z

I think we have very clear paths for 1 that can be described today (based on iterative steps Dan and I had been taking). 3. Definitely can be simplified to fit more closely to helpers around gorestful, but that's been mostly small changes made at each point, and as you say it's in the best shape.

For 2, I can describe a fair amount of what is there now in the context of trying to tie external versions to go structs. The Scheme is a bit of a catch all object that could be more cleanly factored into a smaller object. Encoding and decoding are above the scheme, because they can use the go structs directly, except for some coupling to the object structure themselves which we can work around. Some of the complexity in the code is an attempt to be moderately efficient (although encode/decode/conversion are anything but today), which has a lot of low hanging fruit.

I'll try to draft up something on that specifically soon (and #1, because from where we are today to where we want to be should be moderately mechanical refactoring).

On Feb 3, 2015, at 7:25 PM, Brian Grant notifications@github.com wrote:

The actionable cleanup for (1) and (3) are most clear to me.

As discussed in #1451 (1) and #2726 (3), the apiserver should be entirely generic. Status should be pushed into etcd via calls to apiserver by controllers and not computed on the fly by apiserver. Validation and defaulting code should live with the registry objects. The individual registry objects shouldn't have any custom etcd code.

Mainly what I wanted to be the topic of this bug was (2), and I don't yet understand that code well enough to prescribe a solution.

—
Reply to this email directly or view it on GitHub.

smarterclayton · 2015-02-04T16:18:41Z

Next steps for simplifying RESTStorage, registry, generic registry, and etcd behavior (1 above):

Delete, Get, List, Watch are all relatively simple
Create and Update are the two complex REST behaviors that need to follow strict rules based on our agreed schema
- UID, creation timestamp need to be filled
- Name is either provided or Generated
- Validating a create is not the same as validating an update
- Some fields are ignored on create (today)
- Some fields should be ignored on update - silently merged (today)
- The fields that are ignored or automatically merged are resource specific, and in the future we will allow automated systems to define those
The general storage pattern for our existing resources is similar, with a few resources like pods that attempt to maintain two objects without a transaction (pod and the underlying bound pod, for constraint checking)
- In the future, there will be limited transactional operations in etcd, and most other reasonable KV stores we would consider already have multi-put (consul, zookeeper, all SQL databases, cockroachdb, a few other), therefore transactional store guarantees MUST remain in the registry layer, but the etcd registry currently can leave things lying around that either need to be cleaned up or read repaired
- In the time period between now and when etcd has multi-put, we may decide we need an "etcd transactional cleanup controller" that periodically cleans up mismatched references
Some objects, during update, need to non-transactionally check the state of the store to validate other objects (to look for overlaps between RCs, for example)
- This is also a responsibility of the registry, because transactional guarantees would make those checks more effective and the registry layer can hide those checks
Having one etcd registry for all our resources (pkg/registry/etcd/etcd.go) is not necessary and potentially encourages people to couple resources of different types together, so they should be split

The current path implementation:

Implement a generic etcd registry object that takes generic API "resources", performs the necessary generic API transformations, has hooks for per-resource behavior, and in general actively discourages deviating from the "API pattern" (mostly done)
Implement all per resource API logic like merging for updates in a way that is not tied to a particular backend (pkg/api/rest)
Each RESTStorage implementation is a thin wrapper that combines 1 and 2 to expose the supported methods for a resource.
The generic test suite on top of 2 can be reused for all resource types, with special cases per resource (see pkg/api/rest/resttest)
Each resource RESTStorage impl gets its own package pkg/pods/etcd/etcd.go and the generic behavior should probably go in pkg/pods or pkg/pods/rest

Example:

// Implements RESTStorage for pods on top of etcd
type PodEtcd struct {
  registry GenericEtcd
}

func NewPodEtcd() RESTStorage {
  return &GenericEtcd{rest.Pods} // the "api logic" for pods is defined in a way that matches an interface
}

func (r *PodEtcd) Update(obj runtime.Object) (chan Result, error) {
  return registry.Update(obj)
}

It's assumed that writing stores to new backends is relatively uncommon, and that the test suite would be easily adapted to new storage types. So we preserve the behavior that PodEtcd implements RESTStorage, and that the apiserver can check whether RESTStorage implements a method via a cast to an interface.

This makes PodEtcd a very thin layer around generic etcd, and someone can write GenericPostgreSQL or GenericConsul in the future.

Remaining todos:

Remove async channel on interfaces
Figure out how metadata about the API can come from RESTStorage to go-restful to swagger
- The more of this we do, the less apiserver is anything other than a set of utility functions for transforming rest storage into go-restful, which is what we want
Need to determine how we model updating status for both end users (who are likely to not want to clobber status when they PUT) and for system users - probably via a flag / header / query param / new rest resource that imposes different behavior.

bgrant0607 · 2015-02-06T22:19:55Z

Thanks for the writeup @smarterclayton. Will try to digest and reconcile with previous discussion in #1451.

I don't yet understand the division between pkg/conversion and pkg/runtime, which contain marshaling/unmarshaling, conversion, defaulting, some generic metadata accessors, some registry-like functionality (per-kind functionality, collections of kinds), support for extensions, and probably some other stuff.

thockin · 2015-02-08T05:15:52Z

I think I like 99% of what's written here. The part I want clarity on is how widespread knowledge of etcd needs to be? I was expecting to see something along the lines of #2594 wherein the kernel of the API server knows that etcd is how we implement the generic Storage interface. All other REST object code simply uses the generic Storage.

More succinctly - the word "etcd" should not appear in the codebase except in the implementation of the storage interface and the code that constructs the concrete instance.

davidopp · 2015-02-08T05:35:59Z

FWIW I think this would be a great topic for the contributors meeting--describe the current partitioning of this functionality and the proposal.

bgrant0607 · 2015-02-09T08:12:14Z

Ok, I think I understand this now. However, I second @thockin's question about etcd.

There is currently business logic in pkg/registry/etcd/etcd.go that doesn't belong there, such as status field initialization upon create or update. It's horrible that the business logic is spread between individual registry objects (e.g., pkg/registry/pod/rest.go), etcd.go, pod_cache.go, and the responsible components (controller-manager, kubelet, etc.). And that doesn't even include the validation, conversion, defaulting, and fuzzing logic.

A primary goal has to be to consolidate object-centric mutation logic. Making the apiserver framework totally generic and the etcd layer totally generic are also important goals.

Objects shouldn't care about details like what keys are used to store them in etcd -- that should be generically derived from Kind.

We should solve field selection #1362 generically, also.

Preconditions should all be reducible to resourceVersion-based preconditions.

The BoundPods stuff will go away with BoundPods.

smarterclayton · 2015-02-09T13:56:02Z

On Feb 9, 2015, at 3:12 AM, Brian Grant notifications@github.com wrote:

Ok, I think I understand this now. However, I second @thockin's question about etcd.

There is currently business logic in pkg/registry/etcd/etcd.go that doesn't belong there, such as status field initialization upon create or update. It's horrible that the business logic is spread between individual registry objects (e.g., pkg/registry/pod/rest.go), etcd.go, pod_cache.go, and the responsible components (controller-manager, kubelet, etc.). And that doesn't even include the validation, conversion, defaulting, and fuzzing logic.

A primary goal has to be to consolidate object-centric mutation logic. Making the apiserver framework totally generic and the etcd layer totally generic are also important goals.

Yes - binding etcd storage to object mutation logic will be the only place someone should see etcd (in master.go where we instantiate it, and in the actual types that do the binding). Those types should be small - a few lines per method at most.
Objects shouldn't care about details like what keys are used to store them in etcd -- that should be generically derived from Kind.

We should solve field selection #1362 generically, also.

Preconditions should all be reducible to resourceVersion-based preconditions.

The BoundPods stuff will go away with BoundPods.

—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2015-02-28T06:23:04Z

See also #1451

smarterclayton · 2015-04-21T15:25:16Z

Started a proposal / summary here #7111

lavalamp · 2016-02-10T21:05:31Z

It's still getting worse, not better, along the understandable metric.

bgrant0607 · 2016-09-06T21:11:15Z

cc @mml

smarterclayton · 2016-09-06T21:36:02Z

I think the storage and storage initialization code has metastasized a bit, as well as API initialization
We've got a proliferation of clients - I think we're up to 35 different clients total.
api/unversioned is a mess.

smarterclayton · 2017-06-01T16:48:01Z

The majority of the refactors have been executed in 1.6, although more work in documentation is certainly helpful. We actually have clean separation of core packages, layering, and stronger control on what goes where.

fejta-bot · 2017-12-26T02:03:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-25T02:51:24Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-24T02:57:49Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

bgrant0607 added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/api Indicates an issue on api area. labels Nov 11, 2014

goltermann added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 3, 2014

bgrant0607 mentioned this issue Jan 29, 2015

Eliminate api/conversion.go #3914

Closed

bgrant0607 mentioned this issue Feb 26, 2015

Add more structure to our directories, and document it #4851

Closed

bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 28, 2015

davidopp added the team/master label Feb 28, 2015

smarterclayton mentioned this issue Apr 21, 2015

Proposal: Long term evolution of runtime.Scheme / conversion.Scheme #7111

Open

bgrant0607 mentioned this issue May 4, 2015

Versioned API objects should be in a self-contained package #7584

Closed

bgrant0607 mentioned this issue May 20, 2015

Code to automatically generate functions for DeepCopy #8320

Merged

uluyol mentioned this issue Jun 18, 2015

Supporting multiple API prefixes #10009

Closed

bgrant0607 mentioned this issue Jun 19, 2015

"v2" API (API/client redesign umbrella issue) #8190

Closed

davidopp added team/api and removed team/master labels Aug 22, 2015

bgrant0607 added this to the v1.2-candidate milestone Sep 12, 2015

lavalamp modified the milestones: next-candidate, v1.2-candidate Feb 10, 2016

bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Sep 6, 2016

bgrant0607 removed this from the next-candidate milestone Sep 6, 2016

bgrant0607 added kind/technical-debt sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. labels Sep 6, 2016

bgrant0607 removed the team/api (deprecated - do not use) label Mar 1, 2017

bgrant0607 added the triaged label Mar 9, 2017

0xmichalis added the kind/documentation Categorizes issue or PR as related to documentation. label Jun 2, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2018

k8s-ci-robot closed this as completed Feb 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Our API code is hard to understand #2306

Our API code is hard to understand #2306

bgrant0607 commented Nov 11, 2014

bgrant0607 commented Jan 28, 2015

yujuhong commented Jan 28, 2015

bgrant0607 commented Jan 28, 2015

smarterclayton commented Jan 28, 2015

yujuhong commented Jan 28, 2015

smarterclayton commented Jan 28, 2015

thockin commented Jan 29, 2015

smarterclayton commented Jan 29, 2015

bgrant0607 commented Feb 3, 2015

smarterclayton commented Feb 3, 2015

davidopp commented Feb 3, 2015

bgrant0607 commented Feb 3, 2015

davidopp commented Feb 3, 2015

smarterclayton commented Feb 3, 2015

bgrant0607 commented Feb 4, 2015

smarterclayton commented Feb 4, 2015

smarterclayton commented Feb 4, 2015

bgrant0607 commented Feb 6, 2015

thockin commented Feb 8, 2015

davidopp commented Feb 8, 2015

bgrant0607 commented Feb 9, 2015

smarterclayton commented Feb 9, 2015

bgrant0607 commented Feb 28, 2015

smarterclayton commented Apr 21, 2015

lavalamp commented Feb 10, 2016

bgrant0607 commented Sep 6, 2016

smarterclayton commented Sep 6, 2016

smarterclayton commented Jun 1, 2017

fejta-bot commented Dec 26, 2017

fejta-bot commented Jan 25, 2018

fejta-bot commented Feb 24, 2018

Our API code is hard to understand #2306

Our API code is hard to understand #2306

Comments

bgrant0607 commented Nov 11, 2014

bgrant0607 commented Jan 28, 2015

yujuhong commented Jan 28, 2015

bgrant0607 commented Jan 28, 2015

smarterclayton commented Jan 28, 2015

yujuhong commented Jan 28, 2015

smarterclayton commented Jan 28, 2015

thockin commented Jan 29, 2015

smarterclayton commented Jan 29, 2015

bgrant0607 commented Feb 3, 2015

smarterclayton commented Feb 3, 2015

davidopp commented Feb 3, 2015

bgrant0607 commented Feb 3, 2015

davidopp commented Feb 3, 2015

smarterclayton commented Feb 3, 2015

bgrant0607 commented Feb 4, 2015

smarterclayton commented Feb 4, 2015

smarterclayton commented Feb 4, 2015

bgrant0607 commented Feb 6, 2015

thockin commented Feb 8, 2015

davidopp commented Feb 8, 2015

bgrant0607 commented Feb 9, 2015

smarterclayton commented Feb 9, 2015

bgrant0607 commented Feb 28, 2015

smarterclayton commented Apr 21, 2015

lavalamp commented Feb 10, 2016

bgrant0607 commented Sep 6, 2016

smarterclayton commented Sep 6, 2016

smarterclayton commented Jun 1, 2017

fejta-bot commented Dec 26, 2017

fejta-bot commented Jan 25, 2018

fejta-bot commented Feb 24, 2018