-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU manager phase 1 #49186
CPU manager phase 1 #49186
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ConnorDoyle Assign the PR to them by writing No associated issue. Update pull-request body to add a reference to an issue, or get approval with The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
// ID of the container to update. | ||
string container_id = 1; | ||
// Resource configuration specific to Linux containers. | ||
LinuxContainerResources linux = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: would make more sense for this to be 2
.
@@ -248,6 +251,9 @@ func (m *kubeGenericRuntimeManager) generateLinuxContainerConfig(container *v1.C | |||
cpuShares = milliCPUToShares(cpuRequest.MilliValue()) | |||
} | |||
lc.Resources.CpuShares = cpuShares | |||
// SETH set initial cpuset here maybe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: kill comment?
return cpuset.NewCPUSet(), fmt.Errorf("not enough cpus available to satisfy request") | ||
} | ||
|
||
// TODO(CD): Acquire CPUs topologically instead of sequentially... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: delete these lines
pkg/kubelet/kubelet.go
Outdated
@@ -1241,6 +1257,9 @@ func (kl *Kubelet) initializeModules() error { | |||
// Start resource analyzer | |||
kl.resourceAnalyzer.Start() | |||
|
|||
// Step 9: Start the CPU manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove Step 9 prefix
pkg/kubemark/hollow_kubelet.go
Outdated
@@ -145,6 +145,7 @@ func GetHollowKubeletConfig( | |||
c.VolumeStatsAggPeriod.Duration = time.Minute | |||
c.CgroupRoot = "" | |||
c.ContainerRuntime = kubetypes.DockerContainerRuntime | |||
c.CPUManagerPolicy = "noop" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use const from cpumanager package instead of this raw string
glog.Infof("[cpumanager] detected CPU topology: %v", topo) | ||
newPolicy = NewStaticPolicy(topo) | ||
default: | ||
glog.Warningf("[cpumanager] Invalid policy, fallback to default policy - 'noop'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: interpolate PolicyNoop
const in this log output
/assign @derekwaynecarr @vishh |
@teferi and @nqn could you please help turn the CLA gate green? There's a wiki entry on how it works here: https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ |
/assign |
@ConnorDoyle thanks for opening this PR 👍 We do need to start moving this forward and getting review if we are going to make 1.8 I did want to clean up the commit history on our dev branch before opening a PR. I would like to maintain credit though for each author. Maybe we can squash by author and rework the commit messages so that we don't have "wip" commits and whatnot. |
Reviewer note: the first commit in this PR is the commit for #46105. I think we might want to review the CRI change separately in that PR, which needs rebasing if we want to keep it open. Thoughts? |
I can squash consecutive commits by author and coalesce the messages. Wanted to do the big rebase first to stay sane; now we're caught up with master.
… On Jul 20, 2017, at 16:43, Seth Jennings ***@***.***> wrote:
@ConnorDoyle thanks for opening this PR 👍 We do need to start moving this forward and getting review if we are going to make 1.8
I did want to clean up the commit history on our dev branch before opening a PR. I would like to maintain credit though for each author. Maybe we can squash by author and rework the commit messages so that we don't have "wip" commits and whatnot.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Rebased |
@@ -116,8 +115,12 @@ func (m *kubeGenericRuntimeManager) startContainer(podSandboxID string, podSandb | |||
m.recordContainerEvent(pod, container, containerID, v1.EventTypeWarning, events.FailedToCreateContainer, "Error: %v", grpc.ErrorDesc(err)) | |||
return grpc.ErrorDesc(err), ErrCreateContainer | |||
} | |||
err = m.cpuManager.RegisterContainer(pod, container, containerID) | |||
if err != nil { | |||
m.recorder.Eventf(ref, v1.EventTypeWarning, events.FailedToStartContainer, "Failed to register container with CPU manager: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce a new events constant here say events.FailedToRegisterContainer or something ?
since this is not a failure to start the container but a registration failure ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, haven't examined the event classifications closely enough to have an opinion about this. However, we are bailing out of startContainer
on account of the registration failure so it doesn't seem too far off.
@@ -810,6 +813,7 @@ func (m *kubeGenericRuntimeManager) removeContainer(containerID string) error { | |||
if err := m.removeContainerLog(containerID); err != nil { | |||
return err | |||
} | |||
m.cpuManager.UnregisterContainer(containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do we do in case we get an error while unregistering the container ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the error is discarded but that probably is not the right thing to do. In case unregistration fails we probably want to fail removeContainer
so that it gets called again. Otherwise we could leak/strand CPUs.
CC @sjenning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated so that errors are surfaced to the caller of killContainer and removeContainer.
- Typos - Type naming. - File rename. - Added alpha warning to --cpu-manager-policy config description.
Addressed some of the review comments. The following TODOs are not done yet but are hidden by the GitHub UI:
|
- Require kubereserved.cpu + systemreserved.cpu > 0. - Updated shared pool and exclusive allocation accordingly.
Awesome. Thanks @ConnorDoyle!
…On Fri, Aug 25, 2017 at 11:01 AM, Connor Doyle ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/api/types.go
<#49186 (comment)>
:
> @@ -3109,6 +3109,8 @@ const (
NodeMemoryPressure NodeConditionType = "MemoryPressure"
// NodeDiskPressure means the kubelet is under pressure due to insufficient available disk.
NodeDiskPressure NodeConditionType = "DiskPressure"
+ // NodeCPUPressure means the kubelet is under pressure due to insufficient available cpu.
@vishh <https://github.com/vishh> -- Updated to remove CPU pressure node
condition and evictions, require cpu reservation and manage the shared pool
accordingly in the static policy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49186 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKMNXpgyzhP9AjFQ8PBPRWWQ-LG70ks5sbwvpgaJpZM4OcVm->
.
|
Automatic merge from submit-queue (batch tested with PRs 50033, 49988, 51132, 49674, 51207) Add cpuset helper library. Blocker for CPU manager #49186 (1 of 6) @sjenning @derekwaynecarr ```release-note NONE ```
@ConnorDoyle PR needs rebase |
Automatic merge from submit-queue (batch tested with PRs 51439, 51361, 51140, 51539, 51585) CPU manager interfaces. Please review / merge #51132 first. Blocker for CPU manager #49186 (3 of 6) @sjenning @derekwaynecarr
Addressed the vast majority of review comments and broke this down into smaller PRs. Those are all linked from the top PR description comment here. Continuing reviews together with @derekwaynecarr there. Thanks everyone for reviews on this monster PR -- closing. |
…epolicy Automatic merge from submit-queue (batch tested with PRs 49971, 51357, 51616, 51649, 51372) CPU manager wiring and `none` policy Blocker for CPU manager #49186 (4 of 6) * Previous PR in this series: #51140 * Next PR in this series: #51180 cc @balajismaniam @derekwaynecarr @sjenning **Release note**: ```release-note NONE ``` TODO: - [X] In-memory CPU manager state - [x] Kubelet config value - [x] Feature gate - [X] None policy - [X] Unit tests - [X] CPU manager instantiation - [x] Calls into CPU manager from Kubelet container runtime
Automatic merge from submit-queue (batch tested with PRs 51180, 51893) CPU manager static policy Blocker for CPU manager #49186 (5 of 6) * Previous PR in this series: #51357 * Next PR in this series: #51041 cc @derekwaynecarr @sjenning @flyingcougar @balajismaniam Attempting to be fairly accurate with main authorship at least at a file level -- please let me know if anyone has a better idea on how to improve this. For posterity, here are the Kubelet flags to run the static policy (assuming `/kube-reserved` is a cgroup that exists for all required controllers) `--feature-gates=CPUManager=true --cpu-manager-policy=static --cpu-manager-reconcile-period=5s --enforce-node-allocatable=pods,kube-reserved --kube-reserved-cgroup=/kube-reserved --kube-reserved=cpu=500m` **Release note**: ```release-note NONE ```
Automatic merge from submit-queue Node e2e tests for the CPU Manager. **What this PR does / why we need it**: - Adds node e2e tests for the CPU Manager implementation in #49186. **Special notes for your reviewer**: - Previous PR in this series: #51180 - Only `test/e2e_node/cpu_manager_test.go` must be reviewed as a part of this PR (i.e., the last commit). Rest of the comments belong in #51357 and #51180. - The tests have been on run on `n1-standard-n4` and `n1-standard-n2` instances on GCE. To run this node e2e test, use the following command: ```sh make test-e2e-node TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS="CPU Manager" SKIP="" PARALLELISM=1 ``` CC @ConnorDoyle @sjenning
What this PR does / why we need it:
E2e tests are in progress and will be added in a subsequent PR.
User / operator docs for
kubernetes/kubernetes.github.io
are merged.Release note:
Plan to break this into smaller PRs...
none
policy #51357Starting the following three pods in order on an 8-core single socket system:
cpuset-visualizer-exclusive-1
cpuset-visualizer-exclusive-3
cpuset-visualizer-exclusive-4
Gave the following allocations. Note that cpu
0
was not used for pods in this case, per kube-reserved and system-reserved as described in the proposal.cc @sjenning @flyingcougar @nqn @teferi @vishh