Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: cache instances during service reload to avoid rate limiting on restart #26900

Merged
merged 1 commit into from
Jun 12, 2016

Conversation

therc
Copy link
Member

@therc therc commented Jun 6, 2016

Fixes #25610 by reducing redundant calls to DescribeInstances()

* The AWS cloudprovider will cache results from DescribeInstances() if the set of nodes hasn't changed

Also move int/stringSlicesEqual from servicecontroller.go to pkg/util/slice

@therc
Copy link
Member Author

therc commented Jun 6, 2016

cc @justinsb @guoshimin

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jun 6, 2016
@therc
Copy link
Member Author

therc commented Jun 7, 2016

@k8s-bot test this issue: #26431

@therc
Copy link
Member Author

therc commented Jun 7, 2016

@k8s-bot test this issue: #26431

@therc therc force-pushed the aws-instance-cache branch from 176ef14 to ddd83a6 Compare June 7, 2016 21:22
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 7, 2016
@therc therc force-pushed the aws-instance-cache branch from 3a92c7d to 329128b Compare June 7, 2016 22:38
@@ -2671,9 +2677,18 @@ func (a *AWSCloud) getInstancesByIDs(instanceIDs []*string) (map[string]*ec2.Ins
return instancesByID, nil
}

// Fetches instances by node names; returns an error if any cannot be found.
// Fetches instances by node names; returns an error if any cannot be found. Input must be sorted.
// This is implemented with a multi value filter on the node names, fetching the desired instances with a single query.
func (a *AWSCloud) getInstancesByNodeNames(nodeNames []string) ([]*ec2.Instance, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if we rename this getInstancesByNodeNamesCached, and add a TODO to rationalize all the caching in 1.4?

I think having an instance cache is a good idea, but I'm not crazy about the idea of having this cached but the single-fetch not cached (inconsistency) and I'm worried about the safety of exposing the whole cached ec2.Instance object (I think it's safe now, but we're setting ourselves up for weird problems later).

I also think that you should just drop the requirement that nodeNames be sorted; I think we just create a sets.String instead (for reasons that will become even clearer in my next few notes!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, added TODO and used a sets.String.

@therc therc force-pushed the aws-instance-cache branch 2 times, most recently from 71078a7 to 83a4010 Compare June 9, 2016 23:51
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 9, 2016
@justinsb
Copy link
Member

LGTM (presuming tests pass)

@justinsb justinsb added this to the v1.3 milestone Jun 10, 2016
@justinsb justinsb added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2016
@justinsb justinsb changed the title AWS: cache values from getInstancesByNodeName() AWS: cache instances during service reload to avoid rate limiting on restart Jun 10, 2016
@justinsb justinsb added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jun 10, 2016
@therc
Copy link
Member Author

therc commented Jun 10, 2016

Sample edited logs with --v=2. I had eight nodes. I removed one (I6) after starting the controller manager and the first call after that cached the new set of instances.

00:21:13 aws.go:2721] Caching instances for map[I1:{} I2:{} I8:{} I5:{} I6:{} I7:{} I4:{} I3:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I2:{} I8:{} I5:{} I6:{} I7:{} I4:{} I3:{} I1:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I7:{} I4:{} I5:{} I6:{} I3:{} I1:{} I2:{} I8:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I3:{} I1:{} I2:{} I8:{} I5:{} I7:{} I4:{} I6:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I5:{} I6:{} I3:{} I1:{} I2:{} I8:{} I7:{} I4:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I7:{} I4:{} I5:{} I6:{} I3:{} I1:{} I2:{} I8:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I7:{} I2:{} I8:{} I5:{} I6:{} I3:{} I1:{} I4:{}]
00:21:13 aws.go:2689] Returning cached instances for map[I5:{} I7:{} I4:{} I6:{} I3:{} I1:{} I2:{} I8:{}]
00:21:23 aws.go:2689] Returning cached instances for map[I7:{} I4:{} I5:{} I6:{} I3:{} I1:{} I2:{} I8:{}]
00:21:23 aws.go:2689] Returning cached instances for map[I6:{} I3:{} I1:{} I2:{} I8:{} I7:{} I4:{} I5:{}]
00:21:23 aws.go:2689] Returning cached instances for map[I8:{} I7:{} I4:{} I5:{} I6:{} I3:{} I1:{} I2:{}]
00:21:23 aws.go:2689] Returning cached instances for map[I1:{} I2:{} I8:{} I7:{} I4:{} I5:{} I6:{} I3:{}]
00:21:24 aws.go:2689] Returning cached instances for map[I3:{} I1:{} I2:{} I8:{} I7:{} I4:{} I5:{} I6:{}]
00:21:24 aws.go:2689] Returning cached instances for map[I2:{} I8:{} I7:{} I4:{} I5:{} I6:{} I3:{} I1:{}]
00:21:24 aws.go:2689] Returning cached instances for map[I2:{} I8:{} I7:{} I4:{} I5:{} I6:{} I3:{} I1:{}]
00:21:24 aws.go:2689] Returning cached instances for map[I3:{} I1:{} I2:{} I8:{} I7:{} I4:{} I5:{} I6:{}]
00:24:13 aws.go:2721] Caching instances for map[I3:{} I1:{} I4:{} I7:{} I2:{} I8:{} I5:{}]
00:24:13 aws.go:2689] Returning cached instances for map[I4:{} I7:{} I2:{} I8:{} I5:{} I3:{} I1:{}]
00:24:13 aws.go:2689] Returning cached instances for map[I7:{} I2:{} I8:{} I5:{} I3:{} I1:{} I4:{}]
00:24:13 aws.go:2689] Returning cached instances for map[I3:{} I1:{} I4:{} I7:{} I2:{} I8:{} I5:{}]
00:24:14 aws.go:2689] Returning cached instances for map[I5:{} I3:{} I1:{} I4:{} I7:{} I2:{} I8:{}]
00:24:14 aws.go:2689] Returning cached instances for map[I3:{} I1:{} I4:{} I7:{} I2:{} I8:{} I5:{}]
00:24:14 aws.go:2689] Returning cached instances for map[I1:{} I4:{} I7:{} I2:{} I8:{} I5:{} I3:{}]
00:24:14 aws.go:2689] Returning cached instances for map[I3:{} I1:{} I4:{} I7:{} I2:{} I8:{} I5:{}]

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2016
@therc therc force-pushed the aws-instance-cache branch from a93014d to 07e71f1 Compare June 10, 2016 16:30
@justinsb justinsb added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2016
@therc
Copy link
Member Author

therc commented Jun 10, 2016

@k8s-bot test this issue: #IGNORE

I can't get to the logs of the alleged failure.

@therc
Copy link
Member Author

therc commented Jun 10, 2016

@k8s-bot e2e test this issue: #26799

1 similar comment
@therc
Copy link
Member Author

therc commented Jun 10, 2016

@k8s-bot e2e test this issue: #26799

@therc
Copy link
Member Author

therc commented Jun 10, 2016

@k8s-oncall what to do about the gcloud crashed: errno 11 problem? Is it a known issue?

@therc
Copy link
Member Author

therc commented Jun 10, 2016

@k8s-bot e2e test this issue: #24476

@therc
Copy link
Member Author

therc commented Jun 10, 2016

@k8s-bot test this issue: #IGNORE

Franz Kafka has come back to life and taken over Jenkins/e2e.

@therc therc force-pushed the aws-instance-cache branch from 07e71f1 to e29709d Compare June 11, 2016 17:56
@therc
Copy link
Member Author

therc commented Jun 11, 2016

See #27205 for the previous failure (docs were versioned to 1.4.0 and the comments in the generated protos did not match).

I updated the commit description to drop the reference to the util/slice changes. I hope it won't drop the LGTM, even if I left the code itself alone...

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2016
@justinsb justinsb added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jun 11, 2016
@therc
Copy link
Member Author

therc commented Jun 11, 2016

@k8s-bot test this issue: #26210

@therc
Copy link
Member Author

therc commented Jun 11, 2016

@k8s-bot test this issue: #IGNORE

@therc
Copy link
Member Author

therc commented Jun 11, 2016

@k8s-bot test this issue: #26062

Looks like maybe etcd is empty for some mysterious reason.

@therc
Copy link
Member Author

therc commented Jun 12, 2016

@k8s-bot test this issue: #27179

@therc
Copy link
Member Author

therc commented Jun 12, 2016

@k8s-bot test this issue: #27179

The torture never stops

@therc
Copy link
Member Author

therc commented Jun 12, 2016

@k8s-bot test this issue: #27179

We are the champions, my friends. And we'll keep on fighting, till the end.

@therc
Copy link
Member Author

therc commented Jun 12, 2016

@k8s-bot test this issue: #26062

"We'll just keep on trying... till the end of time, till the end of time"

@therc
Copy link
Member Author

therc commented Jun 12, 2016

stderr: fatal: reference is not a tree: d9f43d34f8973045549a2cf48be0a3e3bdd913c9

WAT

@therc
Copy link
Member Author

therc commented Jun 12, 2016

@k8s-bot test this issue: #IGNORE

"I'm going slightly mad"

@therc
Copy link
Member Author

therc commented Jun 12, 2016

@k8s-bot test this, issue: #IGNORE

"Just very slightly mad"

@k8s-bot
Copy link

k8s-bot commented Jun 12, 2016

GCE e2e build/test passed for commit e29709d.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jun 12, 2016

GCE e2e build/test passed for commit e29709d.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 6d32eba into kubernetes:master Jun 12, 2016
@therc therc deleted the aws-instance-cache branch July 18, 2016 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants