Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authentication overall latency metrics #82409

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"errors"
"net/http"
"strings"
"time"

apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
Expand Down Expand Up @@ -58,8 +59,19 @@ var (

authenticatedAttemptsCounter = metrics.NewCounterVec(
&metrics.CounterOpts{
Name: "authentication_attempts",
Help: "Counter of authenticated attempts.",
Name: "authentication_attempts",
Help: "Counter of authenticated attempts.",
StabilityLevel: metrics.ALPHA,
RainbowMango marked this conversation as resolved.
Show resolved Hide resolved
},
[]string{"result"},
)

authenticationLatency = metrics.NewHistogramVec(
&metrics.HistogramOpts{
Name: "authentication_duration_seconds",
Help: "Authentication duration in seconds broken out by result.",
Buckets: metrics.ExponentialBuckets(0.001, 2, 15),
Copy link
Member

@mikedanese mikedanese Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bucketing might be a little abrasive depending on the actual latency. We may need to adjust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me some specific instructions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this right, the buckets are:

n | 0.001 n^2
1 | 0.001
2 | 0.004
3 | 0.009
4 | 0.016
5 | 0.025
6 | 0.036
7 | 0.049
8 | 0.064
9 | 0.081
10 | 0.1
11 | 0.121
12 | 0.144
13 | 0.169
14 | 0.196
15 | 0.225

What if authentication takes a second or ten seconds? It'll get put in the 15th bucket along with the ones the requests that took 0.2 seconds. In my experience, I've never been thrilled with exponential bucketing. Unless you actually have a good idea of performance before setting the parameters, it tends to cause weird data issues when you actually want to know how long stuff takes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buckets seem not what you think. Wait a moment, I try to dump from my local cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from my local cluster.

# HELP authentication_duration_seconds [ALPHA] Authentication duration in seconds broken out by result.
# TYPE authentication_duration_seconds histogram
authentication_duration_seconds_bucket{result="success",le="0.001"} 1143
authentication_duration_seconds_bucket{result="success",le="0.002"} 1143
authentication_duration_seconds_bucket{result="success",le="0.004"} 1144
authentication_duration_seconds_bucket{result="success",le="0.008"} 1144
authentication_duration_seconds_bucket{result="success",le="0.016"} 1144
authentication_duration_seconds_bucket{result="success",le="0.032"} 1144
authentication_duration_seconds_bucket{result="success",le="0.064"} 1144
authentication_duration_seconds_bucket{result="success",le="0.128"} 1144
authentication_duration_seconds_bucket{result="success",le="0.256"} 1144
authentication_duration_seconds_bucket{result="success",le="0.512"} 1144
authentication_duration_seconds_bucket{result="success",le="1.024"} 1144
authentication_duration_seconds_bucket{result="success",le="2.048"} 1144
authentication_duration_seconds_bucket{result="success",le="4.096"} 1144
authentication_duration_seconds_bucket{result="success",le="8.192"} 1144
authentication_duration_seconds_bucket{result="success",le="16.384"} 1144
authentication_duration_seconds_bucket{result="success",le="+Inf"} 1144
authentication_duration_seconds_sum{result="success"} 0.14704390199999995
authentication_duration_seconds_count{result="success"} 1144

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enj has pointed the similar issue. So, I enlarged the bucketed number from 10 to 15.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I did n^2 instead of 2^n. Those look fine for now although we may want higher density above 1 second in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. By the way, most latency metric using the same bucket.

Now let's call @liggitt help review and approve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @brancz @logicalhan on bucket count/size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket count/sizes look okay to me.

StabilityLevel: metrics.ALPHA,
},
[]string{"result"},
)
Expand All @@ -68,6 +80,7 @@ var (
func init() {
legacyregistry.MustRegister(authenticatedUserCounter)
legacyregistry.MustRegister(authenticatedAttemptsCounter)
legacyregistry.MustRegister(authenticationLatency)
}

// WithAuthentication creates an http handler that tries to authenticate the given request as a user, and then
Expand All @@ -80,6 +93,8 @@ func WithAuthentication(handler http.Handler, auth authenticator.Request, failed
return handler
}
return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
authenticationStart := time.Now()

if len(apiAuds) > 0 {
req = req.WithContext(authenticator.WithAudiences(req.Context(), apiAuds))
}
Expand All @@ -88,8 +103,10 @@ func WithAuthentication(handler http.Handler, auth authenticator.Request, failed
if err != nil {
klog.Errorf("Unable to authenticate the request due to an error: %v", err)
authenticatedAttemptsCounter.WithLabelValues(errorLabel).Inc()
authenticationLatency.WithLabelValues(errorLabel).Observe(time.Since(authenticationStart).Seconds())
} else if !ok {
authenticatedAttemptsCounter.WithLabelValues(failureLabel).Inc()
authenticationLatency.WithLabelValues(failureLabel).Observe(time.Since(authenticationStart).Seconds())
}

failed.ServeHTTP(w, req)
Expand All @@ -109,6 +126,7 @@ func WithAuthentication(handler http.Handler, auth authenticator.Request, failed

authenticatedUserCounter.WithLabelValues(compressUsername(resp.User.GetName())).Inc()
authenticatedAttemptsCounter.WithLabelValues(successLabel).Inc()
authenticationLatency.WithLabelValues(successLabel).Observe(time.Since(authenticationStart).Seconds())
RainbowMango marked this conversation as resolved.
Show resolved Hide resolved

handler.ServeHTTP(w, req)
})
Expand Down