-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud provider AWS library should query instance by ID when possible #78140
Cloud provider AWS library should query instance by ID when possible #78140
Conversation
Hi @zhan849. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
f506114
to
20c2c4f
Compare
/ok-to-test |
@zhan849 could you please add some release notes as well... something along the lines of (feel free to re-word this):
|
/priority important-soon |
20c2c4f
to
08b4390
Compare
631e65b
to
37b59d2
Compare
37b59d2
to
1c94382
Compare
/retest |
} | ||
|
||
node, err := c.nodeInformer.Lister().Get(string(nodeName)) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require a RBAC policy change so as AWS cloudprovider can list nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I checked that after v1.15, we have deprecated auto-provisioned cloud provider role, and have user to create them #66635
I will modify the release note about the face that we need "list" permission now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @justinsb we might want to modify kops in this case as well? kubernetes/kops#6899
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not remember if we need a RBAC rule if we are using a shared informer. If we do not - we are probably fine. We need to confirm that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually @justinsb I did some tests and verified that we do not need additional RBAC rules for system:aws-cloud-provider
to use shared informer. Our clusters are provisioned using kops and all roles contain default values
1c94382
to
6ae7620
Compare
Two more questions:
Why is this line in PR description?
I understand fetching by instance-id is more efficient but how does this PR reduces number of |
thanks @gnufied |
Is there any AWS documentation for this? |
@gnufied I cannot find any - higher failure rate for filter is just our observation and is likely to be account specific |
Okay, I think the change is reasonable. I would like us to confirm RBAC rule change, whether we need it or not. |
@gnufied I did some test and verified that we do not need additional RBAC for |
/lgtm |
I think we had enough opportunity for other approves to take a look at it. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, zhan849 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
What type of PR is this?
/kind feature
/sig aws
/sig cloud-provider
/sig scalability
What this PR does / why we need it:
This PR changed the implementation of
getInstanceByNodeName()
in aws library to query instance from provider using instance id when possible, since it is much more efficient to get instance by ID in aws than run a filter. This PR checks if it is able to get provider id fromv1.Node
, if possible, it will get instance by node id, or it will fall back to running the queryWe tested internally wit batch PVC provisioning benchmark and here are the improvements shown that makes k8s more scalable in aws environment:
Which issue(s) this PR fixes:
Fixes #78136
This is also the first step towards fixing #78409 , by introduce informer-fed node cache to aws library.
Special notes for your reviewer:
Regarding the tradeoff of offloading the workload to kubernetes:
Node querying has limited frequency, mostly during node controller reconciliation, and the only bursts are during batch infrastructure provisioning (i.e. create a batch of 100 PVC with ebs, aggressive autoscaling by adding 100 nodes to cluster, etc).
Get()
calls are also relatively cheap. Therefore, load added to api server is limited, and can be optimized further by introducing informer interface to aws cloud providerUsually in production, each aws account would have multiple Kubernetes clusters and other non-k8s clusters, but aws related api call throttles are account-wide, so that is to say, if one kubernetes cluster is making inefficient call to cloud provider, other k8s/non-k8s clusters will be affected as well. so it worth the tradeoff and affects of burst in k8s calls can be limited to k8s cluster level, with proper retry mechanism (which we already have) control plane can make progress eventually
Does this PR introduce a user-facing change?:
Release note:
/assign @zhan849 @mcrute @micahhausler
/cc @justinsb @jsafrane