Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admission control proposal #2501

Merged
merged 2 commits into from
Dec 22, 2014

Conversation

derekwaynecarr
Copy link
Member

I am starting to look at how I can support some features that will require admission control, so wanted to get high-level agreement on the strategy so I can look to work on some of the more specific use cases we want to use admission control to achieve.

Feedback appreciated.


1. Quota enforcement of allocated desired usage
2. Pod black-lister to restrict running specific images on the cluster
3. Privileged container checker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd been considering the ability to run a privileged container to be part of Authorization.

@erictune
Copy link
Member

I'm concerned that if quota charges are not updated synchronously with resource creation/deletion, that users will have a bad experience, and/or it will be possible to game the system. For that to work, one of the following needs to happen, I think:

  1. admission controller knows that if it says 200, the resource will definitely be created, so it can increment its counters. And it needs to see deletes too.
  2. admission controller need to learn about pod creations and deletions in a very timely fashion.

For example, it can't really say yes or no to a request until it has seen whether the previous request completed.

For the first option, I think that the admission controller has to happen after validation. In your proposal, it appears to happen before validation.

For the second option, I think it will be much easier to make the checking synchronous if the admission controller is part of the apiserver process, as opposed to being a separate entity.

So, I think we keep our options open more if we initially put admission control into the apiserver process, and only make it a separate service when we are sure we need to.

@derekwaynecarr
Copy link
Member Author

My concern is we may all want to quota different things. I can submit a follow on soon for what we discussed potentially doing for quota enforcement of desired usage. In practice, I think you always need a control loop to reconcile a run once terminated pod as no longer counting against your usage. Let me augmented this PR with an additional doc on a potential quota system that would be built on top tomorrow. Interested in your perspective.

In general, I want to crisp up the difference between policy and admission control so your thoughts prior were really useful.

Sent from my iPhone

On Nov 20, 2014, at 5:42 PM, Eric Tune notifications@github.com wrote:

I'm concerned that if quota charges are not updated synchronously with resource creation/deletion, that users will have a bad experience, and/or it will be possible to game the system. For that to work, one of the following needs to happen, I think:

admission controller knows that if it says 200, the resource will definitely be created, so it can increment its counters. And it needs to see deletes too.
admission controller need to learn about pod creations and deletions in a very timely fashion.
For example, it can't really say yes or no to a request until it has seen whether the previous request completed.

For the first option, I think that the admission controller has to happen after validation. In your proposal, it appears to happen before validation.

For the second option, I think it will be much easier to make the checking synchronous if the admission controller is part of the apiserver process, as opposed to being a separate entity.

So, I think we keep our options open more if we initially put admission control into the apiserver process, and only make it a separate service when we are sure we need to.


Reply to this email directly or view it on GitHub.

@derekwaynecarr
Copy link
Member Author

I also agree that validation should occur prior to admission control. Let me update to reflect that.

A lot of your feedback is centered on their being only one admission control call-out, which I may agree to simplify if I can think on it some more.

As for deletes, I think it's important in a quota system, but I am not sure you can guarantee a delete occurred without some transactional rollback as there is no guarantee the storage actually did the delete after admission control viewed it. As a result, I looked at this as requiring some reconciliation loop to update the allocated desired usage after a delete. We have a history of trying to support rollbacks on pending operation queues and we are trying to avoid it.

Either way, will add more ideas on a quota system soon.

Sent from my iPhone

On Nov 20, 2014, at 5:42 PM, Eric Tune notifications@github.com wrote:

I'm concerned that if quota charges are not updated synchronously with resource creation/deletion, that users will have a bad experience, and/or it will be possible to game the system. For that to work, one of the following needs to happen, I think:

admission controller knows that if it says 200, the resource will definitely be created, so it can increment its counters. And it needs to see deletes too.
admission controller need to learn about pod creations and deletions in a very timely fashion.
For example, it can't really say yes or no to a request until it has seen whether the previous request completed.

For the first option, I think that the admission controller has to happen after validation. In your proposal, it appears to happen before validation.

For the second option, I think it will be much easier to make the checking synchronous if the admission controller is part of the apiserver process, as opposed to being a separate entity.

So, I think we keep our options open more if we initially put admission control into the apiserver process, and only make it a separate service when we are sure we need to.


Reply to this email directly or view it on GitHub.

@derekwaynecarr
Copy link
Member Author

@erictune - discussed your feedback with some members of our team.

RE: priv containers, and host dir
We agree that ability to run priv containers, and use hostDir volume mounts is a user level capability that is fine for core policy to satisfy.

RE: Compile in or call out via REST API
Things are definitely simplified if you do compile in, and they are definitely simpler if you have only 1 admission control endpoint rather than a fan-out. In addition, if fan-out was needed, it could be done by whatever we compile in or invoke via a single REST endpoint.

RE: Request Validation
It sounds like Policy decisions are starting to be made based on some measure of introspection on the incoming resource. If that is the case, would you want Validation to happen prior to Policy or Admission Control? Should we think about moving validation above RESTStorage?

RE: Quota charges of synchronous updates
Absent some transaction, it's not clear how you guarantee this. It's possible a resource passes admission control (and therefore desired usage is ticked up), but its never actually persisted in the REST Store (maybe etcd flaked out). In the delete case, the same situation can occur just in reverse, you tick desired usage down but persistence of delete never completes. To account for this, you need guaranteed rollback, and that just got ugly IMO for how we use etcd now. As a result, my bias was to increment desired usage if admission control said YES, but handle the error and deletion cases via some daemon that would update allocated desired usage in the background. You definitely need an atomic operation around incrementing/decrementing allocated desired usage, but it's non-obvious to me how you can tie that to persistence in etcd of the actual resource. As a result, you had a system where for moments in time what it thinks you have allocated for desired usage is greater than actual usage, but you never had the reverse situation of it thinking you had less allocated than used. This is because DELETEs would be applied in the reconciliation loop in the background.

Other topics:

Quality of Service: for me, admission control is a great vehicle if you want as a provider to offer some level of quality of service gatekeeper. So for example, policy says you have the right to run a priv container, but the admission controller sees that all of the dedicated nodes for your organization are running at capacity so it may choose to not accept the request because it can gleam it may never be able to schedule the pod in a timely manner. So as an operator I do not want to remove your right to run a priv container, but I want to not accept new ones either type of thing. It's this quality of service angle where I think its fair to say you would have some overlap between policy and admission control. Admittedly, that is an advanced use case. Have you guys thought about quality of service type issues wrt to Admission Control decision making?

@erictune
Copy link
Member

RE: Request Validation
I think we will eventually want to make policy depend on attributes that must be parsed from POST/PUT bodies. I have considered three options:

  1. do authorization in two phases: once very early, just after authentication. In this phase, a request which matches only policies which reference body fields would be "allowed to proceed but not yet authorized". And then authorization would be re-checked for those sorts of requests after decoding and validation, when the new information was available.
  2. do all decoding and validation prior to authorization. In this case, I wasn't sure if there could be cases where we still might want to return Forbidden rather a validation error, if both occurred.
  3. do decoding and validation twice for POST/PUTs; once before auth, then throw it away, and then again before create. Writes are so rare compared to reads that this inefficiency won't matter and might make the code structure better.

My strategy has been to wait until we really need body-dependent policies before implementing them.

RE: Quota charges of synchronous updates
We can do a two-phase commit:

  • A lock guards the quota data structure in etcd (or if needed, a separate lock for each "quota bucket", e.g. user or namespace)
  • for reads, which will be 100x to 1000x more common, no change to logic
  • for writes, first, lock quota, then decrement/increment, then create/remove object from registry, then release quota lock.
  • Because the number of masters (1 to 5) is much smaller than the read-to-write request ratio (100 to 1 or bigger), there is no way there will be significant lock contention. The probability of even 1 other apiserver handling a write at the same time is relatively low, and the probability of several is very low.
  • When an apiserver restarts, it will need to verify the quota charges in case a previous apiserver crashed during the two-phase commit. This can be parallelized if needed by sharding by hash(quota bucket name). Holding a lock on quota during recomputation ensures a consistent view of the world since it blocks mutations.

Other topics:
We have experience with use cases like you mention. The administrator is responsible for configuring both admission control and policy in a corresponding way, but the source code for each of those does not need to know about the other one. Note that priv container might imply different QoS, but it needn't. An administrator might have permission to run priv containers on any partition of machines, and might use the priv container to as a way to do administrative tasks on the node.

Re: other comments not specifically addressed above:
I read them and I think we are in agreement.

@kubernetes-bot
Copy link

Can one of the admins verify this patch?

@derekwaynecarr derekwaynecarr force-pushed the admission_control_proposal branch 6 times, most recently from 1634e1c to a26b0f2 Compare December 16, 2014 20:45
@derekwaynecarr
Copy link
Member Author

Prereqs #2977

@derekwaynecarr derekwaynecarr force-pushed the admission_control_proposal branch from a26b0f2 to 0cabf0b Compare December 18, 2014 18:56
@derekwaynecarr
Copy link
Member Author

@erictune - I have vastly simplified the proposal, and limited it to what I view as the bare minimum needed to support admission control for our downstream requirements. I have an implementation PR that I will submit shortly that has this working. Please provide feedback, and we could look to evolve this further over time.

erictune added a commit that referenced this pull request Dec 22, 2014
@erictune erictune merged commit 1f068fa into kubernetes:master Dec 22, 2014
@derekwaynecarr derekwaynecarr deleted the admission_control_proposal branch April 17, 2015 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants