Configuring scheduler via json configuration file #4674

abhgupta · 2015-02-20T18:40:26Z

This PR introduces the capability to configure the scheduler via a json file. The API is versioned. This is the implementation for the issue --> #4303

googlebot · 2015-02-20T18:40:27Z

Thanks for your pull request.

It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits.

Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name.

abhgupta · 2015-02-20T18:41:54Z

@bgrant0607 @davidopp @mikedanese Please review

vmarmol · 2015-02-20T18:53:09Z

Assigning to @bgrant0607, feel free to re-assign.

abhgupta · 2015-02-20T18:56:47Z

@smarterclayton PTAL - this incorporates the feedback that you provided.

vmarmol · 2015-02-20T19:15:35Z

@abhgupta can you rebase? It looks like this doesn't merge cleanly anymore.

abhgupta · 2015-02-23T18:51:20Z

I have some configuration examples written up that I'll submit once this PR goes through some feedback/review. Sample JSON configuration:

{
    "predicates" : [
        {"name" : "RegionAffinity", "argument" : {"serviceAffinity" : {"labels" : ["region"]}}},
        {"name" : "RequireRegion", "argument" : {"labelsPresence" : {"labels" : ["region"], "presence" : true}}},
        {"name" : "PodFitsPorts"},
        {"name" : "MatchNodeSelector}
    ],
    "priorities" : [
        {"name" : "ZoneSpread", "weight" : 2, "argument" : {"serviceAntiAffinity" : {"label" : "zone"}}},
        {"name" : "RackSpread", "weight" : 2, "argument" : {"serviceAntiAffinity" : {"label" : "rack"}}},
        {"name" : "ServiceSpreadingPriority", "weight" : 1}
    ]
}

bgrant0607 · 2015-02-25T07:36:02Z

plugin/pkg/scheduler/algorithmprovider/affinity/affinity.go

@@ -37,9 +37,9 @@ func affinityPredicates() util.StringSet {
 		"PodFitsResources",
 		"NoDiskConflict",
 		// Ensures that all pods within the same service are hosted on minions within the same region as defined by the "region" label
-		factory.RegisterFitPredicate("ServiceAffinity", algorithm.NewServiceAffinityPredicate(factory.PodLister, factory.ServiceLister, factory.MinionLister, []string{"region"})),
+		factory.RegisterFitPredicate("RegionAffinity", algorithm.NewServiceAffinityPredicate(factory.PodLister, factory.ServiceLister, factory.MinionLister, []string{"region"})),


What do "Region" and "Zone" mean in this context? We do not advocate running a single Kubernetes across multiple availability zones.

The current solution is flexible and uses labels to define topological levels. The levels do not themselves have any meaning within the system, and the user is free to ascribe any meaning to them. The labels could be regions/zones/racks or building/server-room/rack.

Given that cross-region deployments is something that is not advocated, I am leaning towards simply removing the Affinity provider completely and just provide configuration examples to the user in documentation to educate them on the possible use cases.

I'm supportive of spreading by arbitrary labels. A common use case on bare metal is spreading across racks (power and network domain) and bus bars (power domain), for example.

I would just prefer not to use region and zone as examples, nor as defaults.

Another possibility is that we could support a few generic topology levels (level1, level2, level3, ..., level6) by default.

Since generic nested topological levels have been implemented, I would be wary of taking a step back and restricting the number of nested levels.

And I agree, I will remove the affinity provider for now and just have affinity/anti-affinity at the different levels be generically defined in documentation.

I didn't mean to imply that we should restrict the number of levels.

@abhgupta That sounds reasonable, please ping me when you've made the change you described.

@danmcp: Do you need anything here?

@bgrant0607 Since you mentioned @danmcp I would just like to get your quick input on something. If it turns out to be a more involved question, I'll create a separate issue.

Dan had asked me why we used services instead of replication controllers for identifying peer pods for spreading. The argument is that services might not be needed and could be avoided in some cases, whereas replication controller will most likely always be there. Technically, you could have a set of individual pods placed behind a service or pods from multiple replication controllers placed behind a service - but what is the most convenient/practical approach?

Thoughts???

@bgrant0607 I would only like to clarify the statements around not running kubernetes across zone. I believe the statement is that you should run kubernetes across small enough area that you can afford to lose (at least the master tier for some time period) and that has low enough network latency to communicate master <-> node and node <-> node for your applications. For some deployments that might mean cluster can't cross zone. For others, cross zone would be desired for higher availability of applications without having to maintain multiple kubernetes clusters. And the more highly available kubernetes masters are, the more likely cross zone would be popular for smaller deployments.

bgrant0607 · 2015-02-25T07:37:45Z

@abhgupta How do expect to distribute the configuration file to the scheduler?

@davidopp Would you mind reviewing? Feel free to ping me directly via email regarding any specific part you'd like me to review.

abhgupta · 2015-02-25T18:54:58Z

@bgrant0607 @davidopp The configuration file is something that the admin will place and make accessible to the scheduler. Typically, I would assume this would be handled by configuration management tools like puppet/chef/ansible/salt/etc.

Either ways, the configuration file is optional and the DefaultProvider comes into play, ensuring that this does not become a gating factor for simple use cases.

davidopp · 2015-02-26T20:49:54Z

Sorry for the delay, taking a look now.

davidopp · 2015-02-26T22:40:13Z

plugin/cmd/kube-scheduler/app/server.go

+		if err != nil {
+			return nil, fmt.Errorf("Unable to read policy config: %v", err)
+		}
+		//err = json.Unmarshal(configData, &policy)


Please remove before submitting.

davidopp · 2015-02-26T23:08:48Z

At a high level I think this PR is great, my main concern is that config files are almost certainly not the way we want to go long-term for component configs. We want something more like #1627, where we use config objects in the store. Unfortunately that effort has fallen victim to a bit of analysis paralysis and I don't see it happening before 1.0 (though I will ping that issue). It would not be hard to adapt what you've done here to use a config source other than file, but I'm worried about people who might start using the file-based config if/when we switch to a different config mechanism.

abhgupta · 2015-02-27T01:37:54Z

@bgrant0607 @davidopp Have made the other changes based on review/feedback. Ideally I would like to get this PR merged soon since the current enhancements to predicates/priorities are not consumable without this PR. Getting this PR in would allow some much needed testing for the non-default scheduler predicates/priorities.

Are we blocking this PR in its current form or do we agree that we can let it be currently merged and convert to secrets later?

bgrant0607 · 2015-02-27T04:18:06Z

I think we can convert to the new config mechanism later, yes. If we implement the secret-like approach, the data will be populated in a volume, so this mechanism might not even require changes.

bgrant0607 · 2015-02-27T04:28:14Z

@abhgupta @danmcp: why use services instead of replication controllers for identifying peer pods for spreading. Discussed in #1965 and #2312. Replication controllers are associated with deployments. We expect there to be multiple per service: canaries, rolling updates, multiple release tracks, etc. What people want is spreading of the whole service. Custom spreading is discussed in #4301 and #367.

bgrant0607 · 2015-02-27T04:33:17Z

@danmcp: Yes, I agree with your description.

davidopp · 2015-02-27T06:52:45Z

LGTM

abhgupta · 2015-02-27T21:03:52Z

@davidopp PTAL - I have added test cases that accept a JSON config for the scheduler. This is now ready for a merge.

davidopp · 2015-03-01T23:29:04Z

LGTM

abhgupta · 2015-03-02T18:37:07Z

@davidopp I modified the test case to remove the dependency on the algorithm provider. Importing the "defaults" algorithm provider was creating an import loop and all I needed for the test case was to test the combination of configurable and pre-defined predicates/priorities.

Configuring scheduler via json configuration file

googlebot added the cla: no label Feb 20, 2015

vmarmol added cla: yes and removed cla: no labels Feb 20, 2015

vmarmol assigned bgrant0607 Feb 20, 2015

googlebot added cla: no and removed cla: yes labels Feb 20, 2015

abhgupta force-pushed the abhgupta-dev branch 2 times, most recently from 21f4c71 to 2ddb069 Compare February 20, 2015 19:55

This was referenced Feb 23, 2015

Added further details with configuration examples openshift/openshift-docs#187

Merged

More general scheduling constraints #367

Closed

bgrant0607 reviewed Feb 25, 2015
View reviewed changes

bgrant0607 assigned davidopp and unassigned bgrant0607 Feb 25, 2015

davidopp reviewed Feb 26, 2015
View reviewed changes

davidopp mentioned this pull request Feb 26, 2015

Store master and node config in API #1627

Closed

abhgupta force-pushed the abhgupta-dev branch 2 times, most recently from a524824 to 2766657 Compare February 27, 2015 00:47

davidopp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2015

abhgupta force-pushed the abhgupta-dev branch 2 times, most recently from eba9270 to add2868 Compare February 27, 2015 20:53

davidopp mentioned this pull request Feb 27, 2015

WIP: Adding pod compute resource stats to kubelet API. #4685

Closed

Abhishek Gupta added 6 commits March 2, 2015 10:00

Configuring scheduler via json configuration file

548e0da

Implementing PR feedback

3607a16

Removing affinity provider

28fbde0

Fixing comment alignment

e5d319d

Added test cases

a04e600

Fixing test case to remove dependency on algorithm provider

5e096fe

abhgupta force-pushed the abhgupta-dev branch from add2868 to 5e096fe Compare March 2, 2015 18:00

davidopp added a commit that referenced this pull request Mar 2, 2015

Merge pull request #4674 from abhgupta/abhgupta-dev

32523f8

Configuring scheduler via json configuration file

davidopp merged commit 32523f8 into kubernetes:master Mar 2, 2015

This was referenced Mar 3, 2015

Scheduler spreading using services vs replication controllers #4971

Closed

Scheduler configuration via config file #4303

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuring scheduler via json configuration file #4674

Configuring scheduler via json configuration file #4674

abhgupta commented Feb 20, 2015

googlebot commented Feb 20, 2015

abhgupta commented Feb 20, 2015

vmarmol commented Feb 20, 2015

abhgupta commented Feb 20, 2015

vmarmol commented Feb 20, 2015

abhgupta commented Feb 23, 2015

bgrant0607 Feb 25, 2015

abhgupta Feb 25, 2015

bgrant0607 Feb 26, 2015

abhgupta Feb 26, 2015

bgrant0607 Feb 26, 2015

davidopp Feb 26, 2015

bgrant0607 Feb 26, 2015

abhgupta Feb 26, 2015

danmcp Feb 26, 2015

bgrant0607 commented Feb 25, 2015

abhgupta commented Feb 25, 2015

davidopp commented Feb 26, 2015

davidopp Feb 26, 2015

davidopp commented Feb 26, 2015

abhgupta commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

davidopp commented Feb 27, 2015

abhgupta commented Feb 27, 2015

davidopp commented Mar 1, 2015

abhgupta commented Mar 2, 2015

Configuring scheduler via json configuration file #4674

Configuring scheduler via json configuration file #4674

Conversation

abhgupta commented Feb 20, 2015

googlebot commented Feb 20, 2015

abhgupta commented Feb 20, 2015

vmarmol commented Feb 20, 2015

abhgupta commented Feb 20, 2015

vmarmol commented Feb 20, 2015

abhgupta commented Feb 23, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgrant0607 commented Feb 25, 2015

abhgupta commented Feb 25, 2015

davidopp commented Feb 26, 2015

Choose a reason for hiding this comment

davidopp commented Feb 26, 2015

abhgupta commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

davidopp commented Feb 27, 2015

abhgupta commented Feb 27, 2015

davidopp commented Mar 1, 2015

abhgupta commented Mar 2, 2015