Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Philosophy: when and how should kubelet reject assigned pods? #5335

Closed
lavalamp opened this issue Mar 11, 2015 · 4 comments
Closed

Philosophy: when and how should kubelet reject assigned pods? #5335

lavalamp opened this issue Mar 11, 2015 · 4 comments
Labels
kind/design Categorizes issue or PR as related to design. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@lavalamp
Copy link
Member

Summarizing a IRL conversation.

Bug report: kube-scheduler doesn't see its own assignments for [latency amount], so it commonly schedules incompatible pods together. We're removing the atomic check (#5320) which exposes this. I (lavalamp) will try and make a PR to fix this.

The philosophical question is, should this ever be allowed to happen? It's a bad user experience to have your pod mis-scheduled and dropped on the floor.

@brendanburns wanted these constraints to be atomically checked (as they are in boundPods). However, there are two main concerns with this: a) if an error creeps in, the system isn't self-healing, and b) write contention over boundPods slows the system drastically.

Instead, I think we've agreed on a tiered approach, where the goal is that scheduler gets it right 99%+ of the time, but kubelet is able to reject a pod that isn't compatible if necessary. And for the usability concern, we'll discuss letting kubelet unassign incompatible pods instead of setting them to failed.

@erictune @alex-mohr

@lavalamp lavalamp added kind/design Categorizes issue or PR as related to design. priority/design sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Mar 11, 2015
@dchen1107
Copy link
Member

cc/ @bgrant0607

@bgrant0607
Copy link
Member

Dupe of #5334.

@bgrant0607
Copy link
Member

But, yes, we should fix the scheduler if it screws up often.

@bgrant0607
Copy link
Member

In case it wasn't clear from #5334:

Pods should never be treated as durable pets. In general, users shouldn't be creating pods directly. They should almost always use controllers, not unlike auto-scaling groups on AWS (AIUI) or our internal "job" abstraction.

Pod is exposed as a primitive to facilitate writing schedulers, controllers, etc., to facilitate bootstrapping, for separation of concerns -- Kubelet vs. cluster-level components, to facilitate decoupling of replication controllers and services, and so that replication controllers and other similar controllers don't need to proxy instance-level operations.

The right solution for pets is something like nominal services #260. In the not-too-distant future, controllers will need to be able to replace instances in advance of their termination and certainly in advance of deletion (e.g., for planned evictions, image prefetching, unidling, or live pod migration #3949).

I'll update pods.md to clarify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

3 participants