Facilitate API orchestration #34363

bgrant0607 · 2016-10-07T22:22:44Z

Currently, building a general-purpose orchestrator for our API is more challenging than it should be.

Some problems (probably not exhaustive):

There's no general way to determine when applications and/or services are "ready", such as for cases where applications need other applications to be up in order to start properly, which apparently is the case for some non-cloud-native applications (Pods need to pre-declare service links iff they want the environment variables created #1768, AppController - stateful app deployments #29453). Service-level readiness would be useful for a number of purposes. Disruption budget is related.
There's no general way to determine when operations, such as cascading deletion ([RFC][GarbageCollector] expose the progress of garbage collection #29891), scaling, or deployment, are "done". Deletion is a special case, but most other cases amount to observed state matching desired state (or desired state has subsequently changed again). This is related to generation and observedGeneration.
There's no general way to determine when operations, such as deployment, "fail". In most cases, I think this amounts to inability to achieve the desired state within an acceptable amount of time.
There's no general way to hook creation and deletion (though finalizers should address the latter) Places for hooks #3585, nor other resource lifecycle events Systematically generate events for object lifecycle events #3692.
There's no general way to query resource inter-dependencies.
There's no general way to shutdown a cluster without leaking external resources (e.g., external load balancers).

See also:
#29453
#1899
#5164
kubernetes/website#41954
#4630
#15203
#29891
#14961
#14181
#1503
#32157
#38216

cc @lavalamp @pwittrock @smarterclayton

0xmichalis · 2016-10-17T22:50:32Z

cc: @mfojtik

smarterclayton · 2016-10-20T00:25:47Z

Better Ansible integration (and other config mgmt) is a part of this. Supporting apply as well as things like create * safety and versionably is important for solving hard problems. I have some proposals pending for this.

I think the hooks and perm failed deployments and custom strategies proposals made me start thinking we're missing a DeploymentJob object - a Job that runs to completion and can be declaratively tracked. Essentially a level driven Job that can be updated directly. Custom strategies could also leverage a job template and manage the job themselves.

Ultimately there is probably a separation between declarative state and level driven config workflows - the goal I think should be to have the necessary flexibility to declare workflows that allow people to walk away from a cluster and have it keep ticking over with security updates and key rotation as well as periodic development workflows in tools like Jenkins.

0xmichalis · 2016-11-08T16:23:10Z

cc: @soltysh you were asking for this I think

soltysh · 2016-11-09T22:13:16Z

I like the idea of a Job driving the custom deployment.

mfojtik · 2016-11-10T08:14:28Z

@soltysh with the custom deployment strategy controller you can facilitate jobs or any other resources.

ghost · 2016-11-10T13:03:44Z

From a Google SRE's perspective, what Kubernetes most clearly lacks is the concept of an operation. An operation must support the following functions:

When an update on a resource is triggered, a handle to an operation is returned.
The operation has well-defined states for "running", "succeeded" or "failed".
The operation has a status field that shows the operation's progress while it's running.
The operation has an error field that shows the reason for a failed operation.
Optional: The operation can be cancelled or rolled back.

Deployment is by far not the only API object that needs operation support. For example, services and ingresses need operations that block until they have acquired IP addresses.

For us, the lack of API support for operations is the single most significant shortcoming of Kubernetes, because it makes any management of K8s unreliable.

mfojtik · 2016-11-10T13:15:06Z

@steve-wolter deployments currently have perma-failed and conditions that reports the progress of the rollout operation.

0xmichalis · 2016-11-10T13:17:00Z

@steve-wolter deployments currently have perma-failed and conditions that reports the progress of the rollout operation.

s/perma-failed/progressDeadlineSeconds/

davidopp · 2016-11-10T13:21:08Z

@steve-wolter

It seems that the spec/status model supports what you want when creating resources: user creates the object (filling in the spec) and then polls or watches the object's status indicating progress of the "operation." For example, create a ReplicaSet, and then watch the status to see how many replicas have been created so far.

Updates are a little less obvious since the modification is done to the spec in-place (as opposed to creating a separate "request to update object X" object), but Deployment does track this stuff under the covers, which is what allows for example this part of the flow (copied from the documentation):

$ kubectl rollout status deployment/nginx-deployment
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
deployment nginx-deployment successfully rolled out

ghost · 2016-11-10T13:32:10Z

@davidopp @mfojtik

There are two problems with the approach you're suggesting:

Big problem: Each and every API object has a different status notion. There is no generic way to query status. This places unacceptable coding burden on higher-level automation such as Google's Cloud Deployment Manager.
Smaller problem, but good to kill 2 birds with that stone: There is no distinction of status between subsequent operations. Suppose two different users/tasks update a deployment in short succession. How are they going to keep their operations apart?

I know that kubectl has all kinds of code for validating inputs, tracking progress and checking status. However, the long-term goal for Google is to run Kubernetes as cattle infrastructure. In the cloud, we want to run thousands of services with a single SRE team, and if this is to succeed, we can't keep caring about individual clusters and shell scripts that launch kubectl and grep its output. Kubernetes should be offering a good API.

lavalamp · 2016-11-16T19:10:52Z

@steve-wolter Sorry for slow response.

I think "Kubernetes lacks the concept of an operation" is correct. The system does not expose operation-level constructs because the system does not really have operation level constructs. (trivia: we had "operations" in apiserver at one point, but removed them!)

Due to the declarative nature of the API, "operation" isn't really a thing that actually makes sense. I think there are several concepts that we could actually present, but unfortunately they wouldn't provide the semantics that you really want.

Has the system registered the intent of the request? (i.e., a spec is valid, stored, replicated)
Has the system made reality look as requested?

The problem with treating 1 as an operation is that it's not very useful, it doesn't tell you much about the state of the cluster (as opposed to the configuration of the cluster).

The problem with treating 2 as an operation is that it's a control loop. That is, reality and desired state will usually stay together but it is a dynamic process and just because they match at T=1 doesn't necessarily mean they will match at T=2 (think dynamically scaled replica counts).

The first thing that comes to my mind that could actually be helpful, is to make a concrete list of things you want to know, and maybe we can see if some API pattern can be used to simply represent all of these questions. Like, maybe we could use conditions (IP_ASSIGNMENT: ASSIGNED) to let you see the bits that you need to make decisions based on. I don't think we can really make a "I'm all done!" flag, because e.g. before taking the next step, sometimes you just care that the service has an IP assigned, and sometimes you also want it to have functioning endpoints.

TL;DR: there is an impedance mismatch between operations (imperative API) and Kubernetes's declarative API. Some form of adaptor is going to be necessary. It's not possible to look at spec and status and say "ready" or "not ready" because "ready" means different things to different people.

ghost · 2016-11-18T10:28:43Z

(Note: @lavalamp and I will take this discussion into a VC after Thanksgiving to gain mutual understanding.)

davidopp · 2016-11-20T03:06:21Z

It occurred to me that observedGeneration is vaguely related, if you think of each generation as a request/operation.

smarterclayton · 2016-11-20T18:28:35Z

This has already manifested in 20 or 30 places in Kubernetes with adhoc
support in each. I don't think there is any disagreement that a general
API concept should exist for "has this intent been realized", but there is
a gap in terms of implementing it.

As Dan notes the pattern of "intent" and "intent realized" is imperfectly
manifest via spec and status. In places we have confused this with spec
mutations via controllers (nodeName on pod spec). However, the level
driven aspect of this system has had practical benefits in both the
stability of the system, because everyone has to deal with dynamic
stability instead of edge triggered changes. I would generally agree we
should allow for simpler clients to ask for conditions to manifest to allow
state transitions to build orchestration.

ObservedGeneration is good, but doesn't actually address the real problem
(it can't guarantee that the observed intent has been implemented) which
we've hit with both replica sets and deployments and stateful sets. We
need something better.

The Scale resource has been discussed as one possible example of providing
a generic interface for many resources to implement this pattern - the
interface generalizes "scale" and "actualScale" for a client. Since many
conditions may need additional details, we need many resources in order to
represent a schema. But that shouldn't be an issue.

Cataloging the list of active wait conditions in use today would be a good
start. The test e2es contain an immense number of them (pod scheduled, pod
started, pod terminated) as do the client conditions in client.go.

We've spent a lot of time building smart client logic to implement these
sorts of waits. It's probably time to be moving them back to the server.

On Nov 19, 2016, at 10:06 PM, David Oppenheimer notifications@github.com
wrote:

It occurred to me that observedGeneration is vaguely related, if you think
of each generation as a request/operation.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34363 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxsa-lVgOWqzuAXOWKAmvQ90udNUks5q_7kzgaJpZM4KRgl7
.

bgrant0607 · 2016-11-28T17:28:50Z

@davidopp @smarterclayton observedGeneration is intended as a way for an observer to determine how up to date the status reported by the primary controller for the resource is. No more, no less. That needs to be combined with the usual resourceVersion-based optimistic concurrency mechanism to ensure that controllers don't act upon stale data, and with leader-election sequence numbers in the case of HA (#22007).

As @lavalamp pointed out, the K8s API is optimized for continuous, single-purpose, choreography-style control loops rather than discrete, generic, workflow-style orchestrators, hence this issue.

@steve-wolter Any solution we develop needs to maintain several properties of the current API:

Needs to work with HA clusters, namespace isolation, federated apiservers, federated clusters, storage sharded by resource type and namespace, and non-etcd storage backends (e.g., for metrics and secrets)
Needs to work with control loops that repeatedly update desired and/or current state without monotonically growing storage requirements, without restricting the number of "operations in flight", and without "touching base" at intermediate state changes
Needs to work with third-party controllers that mutate desired state, annotations, initializers/finalizers, third-party resources, external resources, etc.

bgrant0607 · 2016-11-28T18:10:38Z

Example of an orchestrator: https://github.com/Mirantis/k8s-AppController

smarterclayton · 2017-04-23T23:17:57Z

My point on observedGeneration was that something like "activeGeneration" or "realizedGeneration" is something that most controllers actually know, but users have to create their own logic. In a sense - a deployment being "at" a certain state is the level driven behavior that clients are attempting to achieve. observedGeneration is useful, but not sufficient.

…

On Mon, Nov 28, 2016 at 1:10 PM, Brian Grant ***@***.***> wrote: Example of an orchestrator: https://github.com/Mirantis/k8s-AppController — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#34363 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p2lg1vsdCxI_yE-CeKsWcspDIgOCks5rCxkmgaJpZM4KRgl7> .

0xmichalis · 2017-04-24T09:04:30Z

Example of an orchestrator: https://github.com/Mirantis/k8s-AppController

And another one: https://github.com/atlassian/smith

erictune · 2017-06-27T15:45:42Z

Note: saw stackoverflow user asking how to wait for a job to be done. This has come up several times.

0xmichalis · 2017-07-31T15:27:59Z

Closing as per discussion in #25067 (comment)

bgrant0607 · 2017-08-15T15:58:51Z

@Kargakis I actually want this one open. It's about a general API use case (clients that can't deal with eventual consistency) rather than for deployment specifically.

bgrant0607 · 2017-08-15T16:06:10Z

Example from our e2e tests:
https://github.com/kubernetes/kubernetes/blob/master/test/utils/deployment.go#L74

0xmichalis · 2017-08-15T18:46:19Z

@bgrant0607 isn't this case covered already by #1899?

bgrant0607 · 2017-09-22T04:58:53Z

I wrote something on determining success/failure (which is a subset of this issue) here:
https://docs.google.com/document/d/1cLPGweVEYrVqQvBLJg6sxV-TrE5Rm2MNOBA_cxZP2WU/edit#heading=h.edrnxxvhcni2

fejta-bot · 2018-01-06T06:27:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

bgrant0607 · 2018-01-09T06:19:45Z

/lifecycle frozen

nikhita · 2018-03-04T05:54:40Z

/remove-lifecycle stale

bgrant0607 added area/api Indicates an issue on api area. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. team/api sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. area/app-lifecycle labels Oct 7, 2016

bgrant0607 mentioned this issue Oct 7, 2016

Document API architectural approach for soundness and consistency kubernetes/website#41954

Open

bgrant0607 added the area/teardown label Oct 17, 2016

bgrant0607 removed the team/api (deprecated - do not use) label Dec 14, 2016

bgrant0607 mentioned this issue Mar 17, 2017

Proposal: Alternate API representations for resources kubernetes/community#123

Merged

bgrant0607 mentioned this issue Apr 6, 2017

Use NoSchedule taint in Node controller instead of filter node in scheduler #42406

Closed

0xmichalis mentioned this issue Jun 21, 2017

Easy deployment for workflow (and possibly other API objects in the future) #25067

Closed

0xmichalis closed this as completed Jul 31, 2017

bgrant0607 reopened this Aug 15, 2017

bgrant0607 changed the title ~~Facilitate API orchestration/workflow~~ Facilitate API orchestration Aug 15, 2017

bgrant0607 added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. labels Aug 15, 2017

bgrant0607 mentioned this issue Aug 15, 2017

Clarify when to use conditions versus fields in status kubernetes/community#606

Closed

erictune mentioned this issue Aug 16, 2017

Should we deprecate status.conditions? #50798

Closed

bgrant0607 mentioned this issue Aug 17, 2017

Eliminate Phase and simplify Conditions #7856

Closed

This was referenced Sep 19, 2017

Automatic rollback for failed deployments #23211

Closed

Allow users to wait for conditions from kubectl and using the API #1899

Closed

bgrant0607 mentioned this issue Sep 23, 2017

Surface cloud-provider-specific information in Services of type LoadBalancer #52670

Closed

hongchaodeng mentioned this issue Oct 26, 2017

Provide capability to query all children resources #54498

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 9, 2018

bgrant0607 mentioned this issue Feb 8, 2018

Kubectl reported a Deployment scaled where as replicas are unavailable #55369

Open

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2018

alexeldeib mentioned this issue Oct 29, 2019

generic async controller and client reconciler interface Azure/azure-service-operator#389

Merged

ash2k mentioned this issue Sep 5, 2021

Add apply-time-mutation feature kubernetes-sigs/cli-utils#400

Merged

helayoty added this to SIG Apps Sep 29, 2023

github-project-automation bot moved this to Needs Triage in SIG Apps Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facilitate API orchestration #34363

Facilitate API orchestration #34363

bgrant0607 commented Oct 7, 2016 •

edited

Loading

0xmichalis commented Oct 17, 2016

smarterclayton commented Oct 20, 2016

0xmichalis commented Nov 8, 2016

soltysh commented Nov 9, 2016

mfojtik commented Nov 10, 2016

ghost commented Nov 10, 2016

mfojtik commented Nov 10, 2016

0xmichalis commented Nov 10, 2016

davidopp commented Nov 10, 2016

ghost commented Nov 10, 2016

lavalamp commented Nov 16, 2016

ghost commented Nov 18, 2016

davidopp commented Nov 20, 2016

smarterclayton commented Nov 20, 2016

bgrant0607 commented Nov 28, 2016

bgrant0607 commented Nov 28, 2016

smarterclayton commented Apr 23, 2017 via email

0xmichalis commented Apr 24, 2017

erictune commented Jun 27, 2017

0xmichalis commented Jul 31, 2017

bgrant0607 commented Aug 15, 2017

bgrant0607 commented Aug 15, 2017

0xmichalis commented Aug 15, 2017

bgrant0607 commented Sep 22, 2017

fejta-bot commented Jan 6, 2018

bgrant0607 commented Jan 9, 2018

nikhita commented Mar 4, 2018

Facilitate API orchestration #34363

Facilitate API orchestration #34363

Comments

bgrant0607 commented Oct 7, 2016 • edited Loading

0xmichalis commented Oct 17, 2016

smarterclayton commented Oct 20, 2016

0xmichalis commented Nov 8, 2016

soltysh commented Nov 9, 2016

mfojtik commented Nov 10, 2016

ghost commented Nov 10, 2016

mfojtik commented Nov 10, 2016

0xmichalis commented Nov 10, 2016

davidopp commented Nov 10, 2016

ghost commented Nov 10, 2016

lavalamp commented Nov 16, 2016

ghost commented Nov 18, 2016

davidopp commented Nov 20, 2016

smarterclayton commented Nov 20, 2016

bgrant0607 commented Nov 28, 2016

bgrant0607 commented Nov 28, 2016

smarterclayton commented Apr 23, 2017 via email

0xmichalis commented Apr 24, 2017

erictune commented Jun 27, 2017

0xmichalis commented Jul 31, 2017

bgrant0607 commented Aug 15, 2017

bgrant0607 commented Aug 15, 2017

0xmichalis commented Aug 15, 2017

bgrant0607 commented Sep 22, 2017

fejta-bot commented Jan 6, 2018

bgrant0607 commented Jan 9, 2018

nikhita commented Mar 4, 2018

bgrant0607 commented Oct 7, 2016 •

edited

Loading