Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend cluster autoscaler to check AUTOSCALER_ENV_VARS in kube-env #710

Merged
merged 1 commit into from
Mar 19, 2018

Conversation

mtaufen
Copy link
Contributor

@mtaufen mtaufen commented Mar 13, 2018

This is the second part of the fix in kubernetes/kubernetes#61119

This provides a temporary way for the cluster autoscaler to get at
values that were removed from kube-env in kubernetes/kubernetes#60020.
Ideally this information will eventually be available via e.g. the Cluster API,
because kube-env is an internal interface that carries no stability
guarantees.

/cc @roberthbailey @bskiba @mwielgus

@k8s-ci-robot k8s-ci-robot requested a review from bskiba March 13, 2018 19:41
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2018
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 13, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 13, 2018

/assign @mwielgus @roberthbailey

@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 13, 2018

/kind bug
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 13, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 13, 2018

/assign @bskiba

@bskiba
Copy link
Member

bskiba commented Mar 14, 2018

Looks like wee need to handle possible empty values in parseKeyValueListToMap, sth along the lines of: bskiba@a15c29d

Otherwise this looks good to me and I have successfully tested it on GCE together with kubernetes/kubernetes#61119

@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 14, 2018

Looks like wee need to handle possible empty values in parseKeyValueListToMap, sth along the lines of: bskiba/autoscaler@a15c29d

Do you have a test case that will trigger this? (It sounds like something that would come from a leading or trailing comma, which we might want to just disallow)

Otherwise this looks good to me and I have successfully tested it on GCE together with kubernetes/kubernetes#61119

Great!

@bskiba
Copy link
Member

bskiba commented Mar 14, 2018

I tested this on GCE with the new variables and when node-taints where empty (as is in the default case), this errored on me from here: https://github.com/mtaufen/autoscaler/blob/037467fad2c566d901f04a3f78024aff0a55aa54/cluster-autoscaler/cloudprovider/gce/templates.go#L452

I can write a unit-test for this case tomorrow.

@bskiba
Copy link
Member

bskiba commented Mar 15, 2018

Here's a unit test to make it break: bskiba@faa2151 (tests for empty node taints). With bskiba/autoscaler@a15c29d this passes.

Could you incorporate the fix?

@mtaufen mtaufen force-pushed the autoscaler-env-vars branch from 037467f to 9020b95 Compare March 15, 2018 16:46
@mtaufen mtaufen changed the title (WIP) Extend cluster autoscaler to check AUTOSCALER_ENV_VARS in kube-env Extend cluster autoscaler to check AUTOSCALER_ENV_VARS in kube-env Mar 15, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 15, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 15, 2018

Updated, PTAL :)

if err != nil {
return nil, err
}
return parseKeyValueListToMap(labels)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could remove this line and just exit the if block and let the line below return the map to reduce code duplication. up to you whether you think that's more confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err != nil {
return nil, err
}
taintMap, err := parseKeyValueListToMap(taints)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, but in this case there is more duplicated code. Once you get the taints extracted, you can leave the if block and parse them into a map in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
resourcesRegexp := regexp.MustCompile(`--kube-reserved=([^ ]+)`)
return kubeReserved[0], nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a check here to verify that kubeReserved will have a zero-th entry and this won't cause a panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractAutoscalerVarFromKubeEnv already does this:

if len(result) == 0 {
	return nil, fmt.Errorf("var %s not found in %s: %v", name, autoscalerVars, autoscalerVals)
}

so once we check the error, we know we at least have a 0th element.

if err != nil {
return nil, err
}
result := []string{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i personally prefer to write this as var result []string and then let the append calls do the allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nil, err
}
result := []string{}
for _, val := range autoscalerVals {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always expect a single val right? so this outer loop is expected to either execute once or not at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One would hope. But since this file seems to expect the possibility of duplicates in kube-env, I thought it best to handle that here too.

}
result := []string{}
for _, val := range autoscalerVals {
for _, v := range strings.Split(val, ";") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can labels or taints contain a semi-colon? If so, we can't use it as the separator char.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so:
labels: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set

Labels are key/value pairs. Valid label keys have two segments: an optional prefix and name, separated by a slash (/). The name segment is required and must be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (), dots (.), and alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a series of DNS labels separated by dots (.), not longer than 253 characters in total, followed by a slash (/). If the prefix is omitted, the label Key is presumed to be private to the user. Automated system components (e.g. kube-scheduler, kube-controller-manager, kube-apiserver, kubectl, or other third-party automation) which add labels to end-user objects must specify a prefix. The kubernetes.io/ prefix is reserved for Kubernetes core components.
Valid label values must be 63 characters or less and must be empty or begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (
), dots (.), and alphanumerics between.

taints: https://github.com/kubernetes/kubernetes/blob/ca02c11887c65431af85f809606527be7f08062d/pkg/apis/core/validation/validation.go#L3931

Taints use the same validation as labels for keys and values.

The effect part of the taint args to Kubelet (after the :) is split out from the value part here when Kubelet parses the flag: https://github.com/kubernetes/kubernetes/blob/ca02c11887c65431af85f809606527be7f08062d/pkg/util/taints/taints.go#L46

@@ -297,26 +297,41 @@ func TestBuildAllocatableFromCapacity(t *testing.T) {
}

func TestExtractLabelsFromKubeEnv(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there test cases to check for expected errors during parsing (e.g. malformed vars, missing vars, etc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some additional cases for expected errors.

// In v1.10+, kube-reserved is only exposed for the autoscaler via AUTOSCALER_ENV_VARS
// see kubernetes/kubernetes#61119. We try AUTOSCALER_ENV_VARS first, then
// fall back to the old way.
kubeReserved, err := extractAutoscalerVarFromKubeEnv(kubeEnv, "kube_reserved")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In kubernetes/kubernetes#61119 you have a comment that kube-reserved will need to be pulled from KUBELET_ARGS but this code only checks AUTOSCALER_ARGS with a fallback to KUBELET_TEST_ARGS so it won't find the parameter for 1.10 nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GKE we will have kube-reserved in AUTOSCALER_ARGS.

On plain GCE kube-reserved is not added by default to KUBE_ENV so unless the user explicitly adds it to MIG template, we don't have it anyway (this is the current state, not a regression from this PR). So this might just be a question of removing the comment about kube-reserved from kubernetes/kubernetes#61119

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can just edit their MIG templates to include it in AUTOSCALER_ENV_VARS on GCE if they need it. We can probably just remove the comment. I'd recommend we have a release note in the next autoscaler version to that effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I can take care of adding to release notes.

@bskiba
Copy link
Member

bskiba commented Mar 16, 2018

Tested this once again after fixes from @mtaufen on GCE and GKE and still works good. Please fix comments from @roberthbailey and I'll be happy to merge.

@mtaufen mtaufen force-pushed the autoscaler-env-vars branch from 9020b95 to 59270dd Compare March 16, 2018 20:29
@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 16, 2018

/retest

@bskiba
Copy link
Member

bskiba commented Mar 16, 2018

Looks like we don't have github.com/stretchr/testify/require vendored

@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 16, 2018

/sigh
require would result in less garbage when a test fails due to an unexpected error, but I don't want to deal with Godep, so I'll just switch back to assert.

This is the second part of the fix in kubernetes/kubernetes#61119

This provides a temporary way for the cluster autoscaler to get at
values that were removed from kube-env in kubernetes/kubernetes#60020.
Ideally this information will eventually be available via e.g. the Cluster API,
because kube-env is an internal interface that carries no stability
guarantees.
@mtaufen mtaufen force-pushed the autoscaler-env-vars branch from 59270dd to ee890f5 Compare March 16, 2018 21:06
@roberthbailey
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Mar 17, 2018

kubernetes/kubernetes#61119 merged, so this is all that's left

@bskiba bskiba merged commit c782016 into kubernetes:master Mar 19, 2018
yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this pull request Feb 22, 2024
…ge_jobs

[preemption] Add LowerOrNewerEqualPriority preemption
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants