Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PHASE 1] Opaque integer resource accounting. #31652

Merged

Conversation

ConnorDoyle
Copy link
Contributor

@ConnorDoyle ConnorDoyle commented Aug 29, 2016

[PHASE 1] Opaque integer resource accounting.

This change provides a simple way to advertise some amount of arbitrary countable resource for a node in a Kubernetes cluster. Users can consume these resources by including them in pod specs, and the scheduler takes them into account when placing pods on nodes. See the example at the bottom of the PR description for more info.

Summary of changes:

  • Defines opaque integer resources as any resource with prefix pod.alpha.kubernetes.io/opaque-int-resource-.
  • Prevent kubelet from overwriting capacity.
  • Handle opaque resources in scheduler.
  • Validate integer-ness of opaque int quantities in API server.
  • Tests for above.

Feature issue: kubernetes/enhancements#76

Design: http://goo.gl/IoKYP1

Issues:

#28312
#19082

Related:

#19080

CC @davidopp @timothysc @balajismaniam

Release note:

Added support for accounting opaque integer resources.

Allows cluster operators to advertise new node-level resources that would be
otherwise unknown to Kubernetes. Users can consume these resources in pod
specs just like CPU and memory. The scheduler takes care of the resource
accounting so that no more than the available amount is simultaneously
allocated to pods.

Usage example

$ echo '[{"op": "add", "path": "/status/capacity/pod.alpha.kubernetes.io~1opaque-int-resource-bananas", "value": "555"}]' | \
> http PATCH http://localhost:8080/api/v1/nodes/localhost.localdomain/status \
> Content-Type:application/json-patch+json
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 11 Aug 2016 16:44:55 GMT
Transfer-Encoding: chunked

{
    "apiVersion": "v1",
    "kind": "Node",
    "metadata": {
        "annotations": {
            "volumes.kubernetes.io/controller-managed-attach-detach": "true"
        },
        "creationTimestamp": "2016-07-12T04:07:43Z",
        "labels": {
            "beta.kubernetes.io/arch": "amd64",
            "beta.kubernetes.io/os": "linux",
            "kubernetes.io/hostname": "localhost.localdomain"
        },
        "name": "localhost.localdomain",
        "resourceVersion": "12837",
        "selfLink": "/api/v1/nodes/localhost.localdomain/status",
        "uid": "2ee9ea1c-47e6-11e6-9fb4-525400659b2e"
    },
    "spec": {
        "externalID": "localhost.localdomain"
    },
    "status": {
        "addresses": [
            {
                "address": "10.0.2.15",
                "type": "LegacyHostIP"
            },
            {
                "address": "10.0.2.15",
                "type": "InternalIP"
            }
        ],
        "allocatable": {
            "alpha.kubernetes.io/nvidia-gpu": "0",
            "cpu": "2",
            "memory": "8175808Ki",
            "pods": "110"
        },
        "capacity": {
            "alpha.kubernetes.io/nvidia-gpu": "0",
            "pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
            "cpu": "2",
            "memory": "8175808Ki",
            "pods": "110"
        },
        "conditions": [
            {
                "lastHeartbeatTime": "2016-08-11T16:44:47Z",
                "lastTransitionTime": "2016-07-12T04:07:43Z",
                "message": "kubelet has sufficient disk space available",
                "reason": "KubeletHasSufficientDisk",
                "status": "False",
                "type": "OutOfDisk"
            },
            {
                "lastHeartbeatTime": "2016-08-11T16:44:47Z",
                "lastTransitionTime": "2016-07-12T04:07:43Z",
                "message": "kubelet has sufficient memory available",
                "reason": "KubeletHasSufficientMemory",
                "status": "False",
                "type": "MemoryPressure"
            },
            {
                "lastHeartbeatTime": "2016-08-11T16:44:47Z",
                "lastTransitionTime": "2016-08-10T06:27:11Z",
                "message": "kubelet is posting ready status",
                "reason": "KubeletReady",
                "status": "True",
                "type": "Ready"
            },
            {
                "lastHeartbeatTime": "2016-08-11T16:44:47Z",
                "lastTransitionTime": "2016-08-10T06:27:01Z",
                "message": "kubelet has no disk pressure",
                "reason": "KubeletHasNoDiskPressure",
                "status": "False",
                "type": "DiskPressure"
            }
        ],
        "daemonEndpoints": {
            "kubeletEndpoint": {
                "Port": 10250
            }
        },
        "images": [],
        "nodeInfo": {
            "architecture": "amd64",
            "bootID": "1f7e95ca-a4c2-490e-8ca2-6621ae1eb5f0",
            "containerRuntimeVersion": "docker://1.10.3",
            "kernelVersion": "4.5.7-202.fc23.x86_64",
            "kubeProxyVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
            "kubeletVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
            "machineID": "cac4063395254bc89d06af5d05322453",
            "operatingSystem": "linux",
            "osImage": "Fedora 23 (Cloud Edition)",
            "systemUUID": "D6EE0782-5DEB-4465-B35D-E54190C5EE96"
        }
    }
}

After patching, the kubelet's next sync fills in allocatable:

$ kubectl get node localhost.localdomain -o json | jq .status.allocatable
{
  "alpha.kubernetes.io/nvidia-gpu": "0",
  "pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
  "cpu": "2",
  "memory": "8175808Ki",
  "pods": "110"
}

Create two pods, one that needs a single banana and another that needs a truck load:

$ kubectl create -f chimp.yaml
$ kubectl create -f superchimp.yaml

Inspect the scheduler result and pod status:

$ kubectl describe pods chimp
Name:           chimp
Namespace:      default
Node:           localhost.localdomain/10.0.2.15
Start Time:     Thu, 11 Aug 2016 19:58:46 +0000
Labels:         <none>
Status:         Running
IP:             172.17.0.2
Controllers:    <none>
Containers:
  nginx:
    Container ID:       docker://46ff268f2f9217c59cc49f97cc4f0f085d5ac0e251f508cc08938601117c0cec
    Image:              nginx:1.10
    Image ID:           docker://sha256:82e97a2b0390a20107ab1310dea17f539ff6034438099384998fd91fc540b128
    Port:               80/TCP
    Limits:
      cpu:                                      500m
      memory:                                   64Mi
      pod.alpha.kubernetes.io/opaque-int-resource-bananas:   3
    Requests:
      cpu:                                      250m
      memory:                                   32Mi
      pod.alpha.kubernetes.io/opaque-int-resource-bananas:   1
    State:                                      Running
      Started:                                  Thu, 11 Aug 2016 19:58:51 +0000
    Ready:                                      True
    Restart Count:                              0
    Volume Mounts:                              <none>
    Environment Variables:                      <none>
Conditions:
  Type          Status
  Initialized   True 
  Ready         True 
  PodScheduled  True 
No volumes.
QoS Class:      Burstable
Events:
  FirstSeen     LastSeen        Count   From                            SubobjectPath           Type            Reason                  Message
  ---------     --------        -----   ----                            -------------           --------        ------                  -------
  9m            9m              1       {default-scheduler }                                    Normal          Scheduled               Successfully assigned chimp to localhost.localdomain
  9m            9m              2       {kubelet localhost.localdomain}                         Warning         MissingClusterDNS       kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
  9m            9m              1       {kubelet localhost.localdomain} spec.containers{nginx}  Normal          Pulled                  Container image "nginx:1.10" already present on machine
  9m            9m              1       {kubelet localhost.localdomain} spec.containers{nginx}  Normal          Created                 Created container with docker id 46ff268f2f92
  9m            9m              1       {kubelet localhost.localdomain} spec.containers{nginx}  Normal          Started                 Started container with docker id 46ff268f2f92
$ kubectl describe pods superchimp
Name:           superchimp
Namespace:      default
Node:           /
Labels:         <none>
Status:         Pending
IP:
Controllers:    <none>
Containers:
  nginx:
    Image:      nginx:1.10
    Port:       80/TCP
    Requests:
      cpu:                                      250m
      memory:                                   32Mi
      pod.alpha.kubernetes.io/opaque-int-resource-bananas:   10Ki
    Volume Mounts:                              <none>
    Environment Variables:                      <none>
Conditions:
  Type          Status
  PodScheduled  False 
No volumes.
QoS Class:      Burstable
Events:
  FirstSeen     LastSeen        Count   From                    SubobjectPath   Type            Reason                  Message
  ---------     --------        -----   ----                    -------------   --------        ------                  -------
  3m            1s              15      {default-scheduler }                    Warning         FailedScheduling        pod (superchimp) failed to fit in any node
fit failure on node (localhost.localdomain): Insufficient pod.alpha.kubernetes.io/opaque-int-resource-bananas

This change is Reviewable

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

6 similar comments
@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-label-needed labels Aug 29, 2016
@k8s-bot
Copy link

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@ConnorDoyle ConnorDoyle force-pushed the poc-opaque-int-resources branch from 1490d45 to b5e1933 Compare August 29, 2016 23:34
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2016
@ConnorDoyle
Copy link
Contributor Author

@balajismaniam and I are both covered by the Intel corporate CLA (org emails are public in our GitHub profiles)


## Terms
### Opaque node-level resource
A resource associated with a node that has a consumable capacity, which is handled uniformly by the scheduler and Kubelet. (If there are specializations that hang off of the resource name, then it is not opaque!)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/(If there are specializations that hang off of the resource name, then it is not opaque!)/

makes more sense to define what you mean by opaque. In general, the name opaque lends to more confusion. We used to call these "machine limits" in condor.

right now the definition of an opaque resource is opaque.

@k8s-github-robot k8s-github-robot removed the kind/design Categorizes issue or PR as related to design. label Sep 1, 2016
@davidopp davidopp assigned davidopp and unassigned lavalamp Sep 1, 2016
@ConnorDoyle
Copy link
Contributor Author

@timothysc squashed, just awaiting CI results then everything should be good to go. Thanks all for reviews.

@ConnorDoyle
Copy link
Contributor Author

@timothysc checks are green, could you please add your LGTM?

@timothysc timothysc added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2016
@timothysc
Copy link
Member

@ConnorDoyle could you add your release note above too please.

@vishh
Copy link
Contributor

vishh commented Oct 28, 2016

LGTM

@ConnorDoyle
Copy link
Contributor Author

@timothysc added summary line and extended release note in the PR description.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 3f5ad42ac98463d2a4dd5dbfe312b38f53557c3e. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE Node e2e failed for commit 3f5ad42ac98463d2a4dd5dbfe312b38f53557c3e. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
  - Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
  - Ensures supplied opaque int quantities in node capacity,
    node allocatable, pod request and pod limit are integers.
  - Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
@foxish
Copy link
Contributor

foxish commented Oct 28, 2016

@ConnorDoyle You're failing Jenkins verification. I think you need to run ./hack/update_owners.py to fix that.

@ConnorDoyle ConnorDoyle force-pushed the poc-opaque-int-resources branch from 3f5ad42 to b0421c1 Compare October 28, 2016 17:29
@ConnorDoyle
Copy link
Contributor Author

@foxish thanks, just pushed a commit for that. The verification script is passing locally for me now.

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 28, 2016
@timothysc timothysc added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Oct 28, 2016
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 58457da into kubernetes:master Oct 29, 2016
@ConnorDoyle ConnorDoyle deleted the poc-opaque-int-resources branch October 29, 2016 17:43
deads2k pushed a commit to deads2k/kubernetes that referenced this pull request Jan 20, 2017
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)

Minor hygiene in scheduler.

**What this PR does / why we need it**:

Minor cleanups in scheduler, related to PR kubernetes#31652.

- Unified lazy opaque resource caching.
- Deleted a commented-out line of code.

**Release note**:
```release-note
N/A
```
@RenaudWasTaken
Copy link
Contributor

@ConnorDoyle in your example, the superchimp pod gets a "FailedScheduling" status.

However, the behavior for CPU and memory seem to be to put the pod in a "Pending" state which means that when the resource becomes available the pod can get rescheduled.

It seems to me that the current behavior for opaque resource means that pods which request more resources than available fail to be rescheduled once the resources becomes available once again is that correct ?

Is it the expected behavior and/or is there a way to get the same behavior as CPU/memory 😄?

@ConnorDoyle
Copy link
Contributor Author

ConnorDoyle commented Feb 18, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.