Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert HPA controller to support HPA v2 mechanics #41272

Merged

Conversation

DirectXMan12
Copy link
Contributor

This PR converts the HPA controller to support the mechanics from HPA v2.
The HPA controller continues to make use of the HPA v1 client, but utilizes
the conversion logic to work with autoscaling/v2alpha1 objects internally.

It is the follow-up PR to #36033 and part of kubernetes/enhancements#117.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 10, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

The following people have approved this PR: DirectXMan12

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @derekwaynecarr
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Feb 10, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@DirectXMan12
Copy link
Contributor Author

cc @jszczepkowski

@DirectXMan12 DirectXMan12 force-pushed the feature/hpa-v2-controller branch from 82d086a to 2057c2b Compare February 10, 2017 23:58
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2017
@DirectXMan12
Copy link
Contributor Author

cc @caesarxuchao @deads2k @liggitt @lavalamp re the discussion the other day about different object versions in the controller on Slack, hopefully this solution doesn't make you recoil in horror :-P

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2017
@DirectXMan12 DirectXMan12 force-pushed the feature/hpa-v2-controller branch from 2057c2b to 728d7d0 Compare February 14, 2017 16:10
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2017
@@ -23,9 +23,10 @@ import (
// GetResourceUtilizationRatio takes in a set of metrics, a set of matching requests,
// and a target utilization percentage, and calcuates the the ratio of
// desired to actual utilization (returning that and the actual utilization)
func GetResourceUtilizationRatio(metrics PodResourceInfo, requests map[string]int64, targetUtilization int32) (float64, int32, error) {
func GetResourceUtilizationRatio(metrics PodMetricsInfo, requests map[string]int64, targetUtilization int32) (float64, int32, int64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please name the returned values? Now it is unclear which one is which.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the comment should be updated.

}

// GetMetricUtilizationRatio takes in a set of metrics and a target utilization value,
// and calcuates the ratio of desired to actual utilization
// (returning that and the actual utilization)
func GetMetricUtilizationRatio(metrics PodMetricsInfo, targetUtilization float64) (float64, float64) {
metricsTotal := float64(0)
func GetMetricUtilizationRatio(metrics PodMetricsInfo, targetUtilization int64) (float64, int64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name the returned values?

// GetMetricReplicas calculates the desired replica count based on a target metric utilization
// (as a milli-value) for pods matching the given selector in the given namespace, and the
// current replica count
func (c *ReplicaCalculator) GetMetricReplicas(currentReplicas int32, targetUtilization int64, metricName string, namespace string, selector labels.Selector) (replicaCount int32, utilization int64, timestamp time.Time, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that methods GetMetricReplicas and GetRawResourceReplicas are similar. Did you consider making their implementation to use a common method?


// GetObjectMetricReplicas calculates the desired replica count based on a target metric utilization (as a milli-value)
// for the given object in the given namespace, and the current replica count.
func (c *ReplicaCalculator) GetObjectMetricReplicas(currentReplicas int32, targetUtilization int64, metricName string, namespace string, objectRef *autoscaling.CrossVersionObjectReference) (replicaCount int32, utilization int64, timestamp time.Time, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a unittest for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there isn't one because the existing replica calc test is structured around the actual metrics client, so it needs a working implementation of GetObjectMetrics, which we don't have (it just returns a "not-implemented" error). I can either a) restructure the entire test, or b) add in the test in part 3 of this PR, which adds support for custom metrics using the custom metrics API types.

@@ -704,20 +683,6 @@ func TestScaleUpCMUnreadyNoScaleWouldScaleDown(t *testing.T) {
tc.runTest(t)
}

func TestDefaultScaleDown(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have implicit defaulting any more -- it's done a the API level (defaulting for v2alpha1, and conversion for v1 for reasons of method signatures and JSON unmarshalling).

@@ -504,58 +526,6 @@ func (tc *testCase) runTest(t *testing.T) {
tc.verifyResults(t)
}

func TestDefaultScaleUpRC(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these tests removed?

@@ -139,141 +150,140 @@ func (a *HorizontalController) Run(stopCh <-chan struct{}) {
glog.Infof("Shutting down HPA Controller")
}

// getLastScaleTime returns the hpa's last scale time or the hpa's creation time if the last scale time is nil.
func getLastScaleTime(hpa *autoscaling.HorizontalPodAutoscaler) time.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need it anymore? Where is lastScaleTime set to CreationTimestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this was intentional -- probably got lost amongst the rebases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I know why this got removed -- it's largely irrelevent with the improved handling for missing pods -- under the new scheme, we'll only error out if no metrics at all are present, which will only occur a) right after the deployment is initially created, or b) if we have an actual error.


for _, customMetricTarget := range targetList.Items {
for i, metricSpec := range metricSpecs {
if scale.Status.Selector == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this check: scale.Status.Selector is array, we should use len to get its length. I guess this check will be always false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selector is a map, and technically maps can be nil. But I think this was from when we were using the internal version, and never got properly updated. I'll fix it.

@DirectXMan12 DirectXMan12 force-pushed the feature/hpa-v2-controller branch from 728d7d0 to 7846827 Compare February 16, 2017 20:03
This commit converts the HPA controller over to using the new version of
the HorizontalPodAutoscaler object found in autoscaling/v2alpha1.  Note
that while the autoscaler will accept requests for object metrics, the
scale client will return an error on attempts to get object metrics
(since that requires the new custom metrics API, which is not yet
implemented).

This also enables the HPA object in v2alpha1 as a retrievable API
version by default.
@DirectXMan12
Copy link
Contributor Author

@jszczepkowski I believe I've address all of your comments (either as code changes or as replies). PTAL.

@jszczepkowski
Copy link
Contributor

/lgtm /approve

Please, fix e2e tests.

@DirectXMan12
Copy link
Contributor Author

@k8s-bot gce etcd3 e2e test this

Unsure if this was a flake, but looks like it might be. Let's try again...

@DirectXMan12
Copy link
Contributor Author

@k8s-bot cvm gke e2e test this

@k8s-bot kubemark e2e test this

looks like the first one was a flake, let's try the rest again

There was a bug in the HPA v1 conversion logic that would occur when
a custom metric and a metric that was encoded in v1 as
targetCPUUtilizationPercentage were used at the same time.  In this
case, the custom metric could overwrite the CPU metric, or vice versa.

This fixes that bug, and ensures that the fuzzer tests round-tripping
with multiple metrics.
@jszczepkowski jszczepkowski added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 20, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 2f0e5ba into kubernetes:master Feb 20, 2017
@DirectXMan12 DirectXMan12 deleted the feature/hpa-v2-controller branch February 20, 2017 16:54
return true, obj, nil

// and... convert to autoscaling v1 to return the right type
objv1, err := UnsafeConvertToVersionVia(obj, autoscalingv1.SchemeGroupVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do a deep copy first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants