[Part 1] Implementation of equivalence pod #31605

resouer · 2016-08-29T09:37:00Z

Part 1 of #30844

This PR:

Refactored predicate.go, so that GetResourceRequest can be used in other places to GetEquivalencePod.
Implement a equivalence_cache.go to deal with all information we need to calculate an equivalent pod.
Define and register a RegisterGetEquivalencePodFunction.

Work in next PR:

The update of equivalence_cache.go
Unit test
Integration/e2e test

I think we can begin from the equivalence_cache.go? Thanks. cc @wojtek-t @davidopp

If I missed any other necessary part, please point it out.

This change is

k8s-bot · 2016-08-29T23:25:29Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

wojtek-t · 2016-08-31T14:07:04Z

plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go

+	return &equivalencePod
+}
+
+type EquivalencePod struct {


As I wrote in the previous PR - this is not enough for the pod to actually be equivalent to the other one.
We should probably open an issue for this particular problem and discuss it there.

A quick note:

namespace
labels
some annotations (PodAffinity, and others?)

How to enforce that when someone will add a new predicate/priority this will be mirrored here...

@wojtek-t Issue fired: #32024

k8s-bot · 2016-09-19T08:27:30Z

GCE e2e build/test passed for commit 3f49fd1.

k8s-ci-robot · 2016-09-20T14:20:51Z

Jenkins GCE e2e failed for commit 15404d320121707143fe6ea6efb130c214dd7021.

The magic incantation to run this job again is @k8s-bot gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

resouer · 2016-09-21T05:37:44Z

@wojtek-t Added a commit to use ControllerRef, PTAL

ref #32024

wojtek-t · 2016-09-21T09:02:16Z

@resouer - I'm OOO this week - will take a look early next week.

wojtek-t

I took a brief look - I will do a more careful review tomorrow. But added some high-level comment.

wojtek-t · 2016-09-26T14:22:25Z

plugin/pkg/scheduler/equivalence_cache.go

+	if !allExpired {
+		ec.expireLock.Lock()
+		defer ec.expireLock.Unlock()
+		ec.invalidAlgorithmCacheList.Insert(nodeName)


For now, I don't see a reason for making it asynchronous. Why can't we just clear the cache for this node (instead of just marking it as invalid).

@resouer - any thoughts?

Yes, you are right, this can make the cache part much simpler, refactoring

wojtek-t · 2016-09-26T14:22:33Z

plugin/pkg/scheduler/equivalence_cache.go

+
+	if !allExpired {
+		ec.expireLock.Lock()
+		ec.allCacheExpired = true


wojtek-t · 2016-10-06T13:11:41Z

@resouer - thanks; let me know when it will be ready for another look

wojtek-t · 2016-10-07T09:23:44Z

plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go

+	// to be equivalent
+	if len(pod.OwnerReferences) != 0 {
+		for _, ref := range pod.OwnerReferences {
+			if *ref.Controller && ref.Kind != "PetSet" {


Can you please make this "PetSet" a constant?
Or even better, this should probably by slice if values (and for now it will only contain PetSet).

wojtek-t · 2016-10-07T09:24:57Z

plugin/pkg/scheduler/equivalence_cache.go

+	"sync"
+)
+
+const maxCacheEntries = 4096


4096 by default might be to many (e.g. in small clusters). Can you please add a TODO to figure out what this value should be.

wojtek-t · 2016-10-07T10:10:15Z

plugin/pkg/scheduler/equivalence_cache.go

+// SendInvalidAlgorithmCacheReq marks AlgorithmCache item as invalid
+func (ec *EquivalenceCache) SendInvalidAlgorithmCacheReq(nodeName string) {
+	ec.expireLock.RLock()
+	allExpired := ec.allCacheExpired


Hmm - I'm wondering what's the purpose if "allCacheExpired".
Why can't we just do:

ec.expireLock().Lock() defer ec.exprireLock.Unlock() delete(ec.algorithmCache, nodeName)

What is the usecase you had in mind for having this "allCacheExpired"?

Yes, in my mind, in most cases, we should invalid cache for all Nodes, e.g. Controller or Service changed, while in some cases, e.g. Pod deleted, we only need toinvalid the node bound with it. Discussion of this logic belongs to Part 2, so I just keep them here.

And just fixed other nits

wojtek-t · 2016-10-07T10:11:43Z

plugin/pkg/scheduler/equivalence_cache.go

+	getEquivalencePod algorithm.GetEquivalencePodFunc
+	algorithmCache    map[string]AlgorithmCache
+	allCacheExpired   bool
+	expireLock        *sync.RWMutex


nit: I would suggest changing to:

type EquivalenceCache struct { sync.RWMutex getEquivalencePod algorithm.GetEquivalencePodFunc algorithmCache map[string]AlgorithmCache allCacheExpired bool }

Then you can simply do "ec.Lock()".

wojtek-t · 2016-10-07T18:10:12Z

plugin/pkg/scheduler/equivalence_cache.go

+// SendInvalidAlgorithmCacheReq marks AlgorithmCache item as invalid
+func (ec *EquivalenceCache) SendInvalidAlgorithmCacheReq(nodeName string) {
+	ec.RLock()
+	allExpired := ec.allCacheExpired


The previous commend disappeared, so let me comment here.

OK - I buy your argument that there are situations in which we want to invalidate the whole cache.
However, I think that this particular variable is not needed. Basically, what I would do when we need to invalidate all cache, I would simply delete all entries from it. And having this variable doesn't make much sense to me (if we have the whole cache invalidated, we can also call delete(ec.aglrotihmCache, nodeName) and that will be correct, an not visibly slower.

Em, make sense, the flag doesn't save us time actually, removed it.

resouer · 2016-10-08T07:10:58Z

@k8s-bot kubemark e2e test this

k8s-ci-robot · 2016-10-08T07:28:53Z

Jenkins Kubemark GCE e2e failed for commit 497b691. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

wojtek-t · 2016-10-08T09:15:59Z

plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go

+// isValidControllerKind checks if a given controller's kind can be applied to equivalence pod algorithm.
+func isValidControllerKind(kind string) bool {
+	switch kind {
+	case


nit: just about formatting, please change to:

switch kind { // list of kinds that we cannot handle case PetSetKind: return false; default: return true; }

wojtek-t · 2016-10-08T09:16:42Z

plugin/pkg/scheduler/equivalence_cache.go

+	for nodeName, _ := range ec.algorithmCache {
+		delete(ec.algorithmCache, nodeName)
+	}
+	ec.Unlock()


please change to "defer ec.Unlock()" and move to the top

wojtek-t · 2016-10-08T09:17:03Z

plugin/pkg/scheduler/factory/plugins.go

@@ -244,6 +248,11 @@ func RegisterCustomPriorityFunction(policy schedulerapi.PriorityPolicy) string {
 	return RegisterPriorityConfigFactory(policy.Name, *pcf)
 }

+func RegisterGetEquivalencePodFunction(equivalenceFunc algorithm.GetEquivalencePodFunc) {
+	glog.Info("Register getEquivalencePodFunc.")


nit: I would suggest removing this comment

wojtek-t · 2016-10-08T09:17:28Z

@resouer - kubemark failure is not your fault. We had some internal issues, which hopefully should be fixed soon.

I added few more, but very minor comments - once applied, this LGTM

k8s-ci-robot · 2016-10-08T22:20:39Z

Jenkins verification failed for commit 0db7497f0e2442f1ccd7ed3612407578af8e11ee. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

wojtek-t · 2016-10-09T10:27:27Z

@resouer - can you please fix gofmt:

Verifying hack/make-rules/../../hack/verify-gofmt.sh
!!! 'gofmt -s -w' needs to be run on the following files: 
./plugin/pkg/scheduler/equivalence_cache.go

Also - please squash commits, and I will lgtm it then.

Update equivalent class & remove priority Use controller ref Directly clear the cache

k8s-ci-robot · 2016-10-10T08:52:46Z

Jenkins GCI GCE e2e failed for commit 204dbe7. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

wojtek-t · 2016-10-10T08:54:49Z

LGTM - thanks a lot for this PR!

k8s-github-robot · 2016-10-10T09:07:12Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-10-10T09:45:59Z

Automatic merge from submit-queue

googlebot added the cla: yes label Aug 29, 2016

k8s-github-robot assigned davidopp Aug 29, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Aug 29, 2016

wojtek-t self-assigned this Aug 31, 2016

wojtek-t reviewed Aug 31, 2016
View reviewed changes

resouer mentioned this pull request Sep 3, 2016

Defining the equivalence pod #32024

Closed

wojtek-t mentioned this pull request Sep 8, 2016

Change interface of priority functions for easier "framework-based" parallelization #24246

Closed

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 14, 2016

resouer force-pushed the eclass-1 branch from 2ee8f62 to 3f49fd1 Compare September 19, 2016 07:44

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2016

resouer force-pushed the eclass-1 branch 2 times, most recently from 98bac83 to f4b8f00 Compare September 21, 2016 04:32

wojtek-t reviewed Sep 26, 2016

View reviewed changes

wojtek-t added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Sep 27, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2016

resouer force-pushed the eclass-1 branch from ea78c80 to 3f14a10 Compare October 7, 2016 09:07

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2016

wojtek-t reviewed Oct 7, 2016

View reviewed changes

resouer force-pushed the eclass-1 branch from 3f14a10 to fc4508f Compare October 7, 2016 16:00

wojtek-t reviewed Oct 7, 2016

View reviewed changes

resouer force-pushed the eclass-1 branch from fc4508f to 497b691 Compare October 7, 2016 21:16

wojtek-t reviewed Oct 8, 2016

View reviewed changes

resouer force-pushed the eclass-1 branch from 497b691 to 0db7497 Compare October 8, 2016 21:01

Update provider and cache

204dbe7

Update equivalent class & remove priority Use controller ref Directly clear the cache

resouer force-pushed the eclass-1 branch from 0db7497 to 204dbe7 Compare October 10, 2016 08:25

wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2016

k8s-github-robot merged commit 5509e50 into kubernetes:master Oct 10, 2016

resouer deleted the eclass-1 branch November 4, 2016 14:56

foxish mentioned this pull request Jan 10, 2017

Scheduler, StatefulSets, and External Controllers #39687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Part 1] Implementation of equivalence pod #31605

[Part 1] Implementation of equivalence pod #31605

resouer commented Aug 29, 2016 •

edited

Loading

k8s-bot commented Aug 29, 2016

wojtek-t Aug 31, 2016

resouer Sep 1, 2016

resouer Sep 3, 2016

k8s-bot commented Sep 19, 2016

k8s-ci-robot commented Sep 20, 2016

resouer commented Sep 21, 2016 •

edited

Loading

wojtek-t commented Sep 21, 2016

wojtek-t left a comment

wojtek-t Sep 26, 2016

wojtek-t Sep 29, 2016

resouer Oct 6, 2016

wojtek-t Sep 26, 2016

wojtek-t commented Oct 6, 2016

wojtek-t Oct 7, 2016

wojtek-t Oct 7, 2016

wojtek-t Oct 7, 2016

resouer Oct 7, 2016 •

edited

Loading

resouer Oct 7, 2016 •

edited

Loading

wojtek-t Oct 7, 2016

wojtek-t Oct 7, 2016

resouer Oct 7, 2016

resouer commented Oct 8, 2016

k8s-ci-robot commented Oct 8, 2016

wojtek-t Oct 8, 2016

wojtek-t Oct 8, 2016

wojtek-t Oct 8, 2016

wojtek-t commented Oct 8, 2016

k8s-ci-robot commented Oct 8, 2016

wojtek-t commented Oct 9, 2016

k8s-ci-robot commented Oct 10, 2016

wojtek-t commented Oct 10, 2016

k8s-github-robot commented Oct 10, 2016

k8s-github-robot commented Oct 10, 2016

[Part 1] Implementation of equivalence pod #31605

[Part 1] Implementation of equivalence pod #31605

Conversation

resouer commented Aug 29, 2016 • edited Loading

k8s-bot commented Aug 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Sep 19, 2016

k8s-ci-robot commented Sep 20, 2016

resouer commented Sep 21, 2016 • edited Loading

wojtek-t commented Sep 21, 2016

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Oct 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

resouer Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

resouer Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

resouer commented Oct 8, 2016

k8s-ci-robot commented Oct 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Oct 8, 2016

k8s-ci-robot commented Oct 8, 2016

wojtek-t commented Oct 9, 2016

k8s-ci-robot commented Oct 10, 2016

wojtek-t commented Oct 10, 2016

k8s-github-robot commented Oct 10, 2016

k8s-github-robot commented Oct 10, 2016

resouer commented Aug 29, 2016 •

edited

Loading

resouer commented Sep 21, 2016 •

edited

Loading

resouer Oct 7, 2016 •

edited

Loading

resouer Oct 7, 2016 •

edited

Loading