Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate API orchestration #34363

Open
bgrant0607 opened this issue Oct 7, 2016 · 27 comments
Open

Facilitate API orchestration #34363

bgrant0607 opened this issue Oct 7, 2016 · 27 comments
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/teardown lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@bgrant0607
Copy link
Member

bgrant0607 commented Oct 7, 2016

Currently, building a general-purpose orchestrator for our API is more challenging than it should be.

Some problems (probably not exhaustive):

See also:
#29453
#1899
#5164
kubernetes/website#41954
#4630
#15203
#29891
#14961
#14181
#1503
#32157
#38216

cc @lavalamp @pwittrock @smarterclayton

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. team/api sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. area/app-lifecycle labels Oct 7, 2016
@0xmichalis
Copy link
Contributor

cc: @mfojtik

@smarterclayton
Copy link
Contributor

Better Ansible integration (and other config mgmt) is a part of this. Supporting apply as well as things like create * safety and versionably is important for solving hard problems. I have some proposals pending for this.

I think the hooks and perm failed deployments and custom strategies proposals made me start thinking we're missing a DeploymentJob object - a Job that runs to completion and can be declaratively tracked. Essentially a level driven Job that can be updated directly. Custom strategies could also leverage a job template and manage the job themselves.

Ultimately there is probably a separation between declarative state and level driven config workflows - the goal I think should be to have the necessary flexibility to declare workflows that allow people to walk away from a cluster and have it keep ticking over with security updates and key rotation as well as periodic development workflows in tools like Jenkins.

@0xmichalis
Copy link
Contributor

cc: @soltysh you were asking for this I think

@soltysh
Copy link
Contributor

soltysh commented Nov 9, 2016

I like the idea of a Job driving the custom deployment.

@mfojtik
Copy link
Contributor

mfojtik commented Nov 10, 2016

@soltysh with the custom deployment strategy controller you can facilitate jobs or any other resources.

@ghost
Copy link

ghost commented Nov 10, 2016

From a Google SRE's perspective, what Kubernetes most clearly lacks is the concept of an operation. An operation must support the following functions:

  • When an update on a resource is triggered, a handle to an operation is returned.
  • The operation has well-defined states for "running", "succeeded" or "failed".
  • The operation has a status field that shows the operation's progress while it's running.
  • The operation has an error field that shows the reason for a failed operation.
  • Optional: The operation can be cancelled or rolled back.

Deployment is by far not the only API object that needs operation support. For example, services and ingresses need operations that block until they have acquired IP addresses.

For us, the lack of API support for operations is the single most significant shortcoming of Kubernetes, because it makes any management of K8s unreliable.

@mfojtik
Copy link
Contributor

mfojtik commented Nov 10, 2016

@steve-wolter deployments currently have perma-failed and conditions that reports the progress of the rollout operation.

@0xmichalis
Copy link
Contributor

@steve-wolter deployments currently have perma-failed and conditions that reports the progress of the rollout operation.

s/perma-failed/progressDeadlineSeconds/

@davidopp
Copy link
Member

@steve-wolter

It seems that the spec/status model supports what you want when creating resources: user creates the object (filling in the spec) and then polls or watches the object's status indicating progress of the "operation." For example, create a ReplicaSet, and then watch the status to see how many replicas have been created so far.

Updates are a little less obvious since the modification is done to the spec in-place (as opposed to creating a separate "request to update object X" object), but Deployment does track this stuff under the covers, which is what allows for example this part of the flow (copied from the documentation):

$ kubectl rollout status deployment/nginx-deployment
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
deployment nginx-deployment successfully rolled out

@ghost
Copy link

ghost commented Nov 10, 2016

@davidopp @mfojtik

There are two problems with the approach you're suggesting:

  1. Big problem: Each and every API object has a different status notion. There is no generic way to query status. This places unacceptable coding burden on higher-level automation such as Google's Cloud Deployment Manager.
  2. Smaller problem, but good to kill 2 birds with that stone: There is no distinction of status between subsequent operations. Suppose two different users/tasks update a deployment in short succession. How are they going to keep their operations apart?

I know that kubectl has all kinds of code for validating inputs, tracking progress and checking status. However, the long-term goal for Google is to run Kubernetes as cattle infrastructure. In the cloud, we want to run thousands of services with a single SRE team, and if this is to succeed, we can't keep caring about individual clusters and shell scripts that launch kubectl and grep its output. Kubernetes should be offering a good API.

@lavalamp
Copy link
Member

@steve-wolter Sorry for slow response.

I think "Kubernetes lacks the concept of an operation" is correct. The system does not expose operation-level constructs because the system does not really have operation level constructs. (trivia: we had "operations" in apiserver at one point, but removed them!)

Due to the declarative nature of the API, "operation" isn't really a thing that actually makes sense. I think there are several concepts that we could actually present, but unfortunately they wouldn't provide the semantics that you really want.

  1. Has the system registered the intent of the request? (i.e., a spec is valid, stored, replicated)
  2. Has the system made reality look as requested?

The problem with treating 1 as an operation is that it's not very useful, it doesn't tell you much about the state of the cluster (as opposed to the configuration of the cluster).

The problem with treating 2 as an operation is that it's a control loop. That is, reality and desired state will usually stay together but it is a dynamic process and just because they match at T=1 doesn't necessarily mean they will match at T=2 (think dynamically scaled replica counts).

The first thing that comes to my mind that could actually be helpful, is to make a concrete list of things you want to know, and maybe we can see if some API pattern can be used to simply represent all of these questions. Like, maybe we could use conditions (IP_ASSIGNMENT: ASSIGNED) to let you see the bits that you need to make decisions based on. I don't think we can really make a "I'm all done!" flag, because e.g. before taking the next step, sometimes you just care that the service has an IP assigned, and sometimes you also want it to have functioning endpoints.

TL;DR: there is an impedance mismatch between operations (imperative API) and Kubernetes's declarative API. Some form of adaptor is going to be necessary. It's not possible to look at spec and status and say "ready" or "not ready" because "ready" means different things to different people.

@ghost
Copy link

ghost commented Nov 18, 2016

(Note: @lavalamp and I will take this discussion into a VC after Thanksgiving to gain mutual understanding.)

@davidopp
Copy link
Member

It occurred to me that observedGeneration is vaguely related, if you think of each generation as a request/operation.

@smarterclayton
Copy link
Contributor

This has already manifested in 20 or 30 places in Kubernetes with adhoc
support in each. I don't think there is any disagreement that a general
API concept should exist for "has this intent been realized", but there is
a gap in terms of implementing it.

As Dan notes the pattern of "intent" and "intent realized" is imperfectly
manifest via spec and status. In places we have confused this with spec
mutations via controllers (nodeName on pod spec). However, the level
driven aspect of this system has had practical benefits in both the
stability of the system, because everyone has to deal with dynamic
stability instead of edge triggered changes. I would generally agree we
should allow for simpler clients to ask for conditions to manifest to allow
state transitions to build orchestration.

ObservedGeneration is good, but doesn't actually address the real problem
(it can't guarantee that the observed intent has been implemented) which
we've hit with both replica sets and deployments and stateful sets. We
need something better.

The Scale resource has been discussed as one possible example of providing
a generic interface for many resources to implement this pattern - the
interface generalizes "scale" and "actualScale" for a client. Since many
conditions may need additional details, we need many resources in order to
represent a schema. But that shouldn't be an issue.

Cataloging the list of active wait conditions in use today would be a good
start. The test e2es contain an immense number of them (pod scheduled, pod
started, pod terminated) as do the client conditions in client.go.

We've spent a lot of time building smart client logic to implement these
sorts of waits. It's probably time to be moving them back to the server.

On Nov 19, 2016, at 10:06 PM, David Oppenheimer notifications@github.com
wrote:

It occurred to me that observedGeneration is vaguely related, if you think
of each generation as a request/operation.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34363 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxsa-lVgOWqzuAXOWKAmvQ90udNUks5q_7kzgaJpZM4KRgl7
.

@bgrant0607
Copy link
Member Author

@davidopp @smarterclayton observedGeneration is intended as a way for an observer to determine how up to date the status reported by the primary controller for the resource is. No more, no less. That needs to be combined with the usual resourceVersion-based optimistic concurrency mechanism to ensure that controllers don't act upon stale data, and with leader-election sequence numbers in the case of HA (#22007).

As @lavalamp pointed out, the K8s API is optimized for continuous, single-purpose, choreography-style control loops rather than discrete, generic, workflow-style orchestrators, hence this issue.

@steve-wolter Any solution we develop needs to maintain several properties of the current API:

  • Needs to work with HA clusters, namespace isolation, federated apiservers, federated clusters, storage sharded by resource type and namespace, and non-etcd storage backends (e.g., for metrics and secrets)
  • Needs to work with control loops that repeatedly update desired and/or current state without monotonically growing storage requirements, without restricting the number of "operations in flight", and without "touching base" at intermediate state changes
  • Needs to work with third-party controllers that mutate desired state, annotations, initializers/finalizers, third-party resources, external resources, etc.

@bgrant0607
Copy link
Member Author

Example of an orchestrator: https://github.com/Mirantis/k8s-AppController

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 23, 2017 via email

@0xmichalis
Copy link
Contributor

Example of an orchestrator: https://github.com/Mirantis/k8s-AppController

And another one: https://github.com/atlassian/smith

@erictune
Copy link
Member

Note: saw stackoverflow user asking how to wait for a job to be done. This has come up several times.

@0xmichalis
Copy link
Contributor

Closing as per discussion in #25067 (comment)

@bgrant0607
Copy link
Member Author

@Kargakis I actually want this one open. It's about a general API use case (clients that can't deal with eventual consistency) rather than for deployment specifically.

@bgrant0607 bgrant0607 reopened this Aug 15, 2017
@bgrant0607 bgrant0607 changed the title Facilitate API orchestration/workflow Facilitate API orchestration Aug 15, 2017
@bgrant0607
Copy link
Member Author

@bgrant0607 bgrant0607 added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog. labels Aug 15, 2017
@0xmichalis
Copy link
Contributor

@bgrant0607 isn't this case covered already by #1899?

@bgrant0607
Copy link
Member Author

I wrote something on determining success/failure (which is a subset of this issue) here:
https://docs.google.com/document/d/1cLPGweVEYrVqQvBLJg6sxV-TrE5Rm2MNOBA_cxZP2WU/edit#heading=h.edrnxxvhcni2

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018
@bgrant0607
Copy link
Member Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 9, 2018
@nikhita
Copy link
Member

nikhita commented Mar 4, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2018
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/teardown lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Status: Needs Triage
Development

No branches or pull requests