-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds metrics to PortForward Websockets #126187
Adds metrics to PortForward Websockets #126187
Conversation
/sig api-machinery |
Help: "Total number of requests that were handled by the StreamTunnelProxy, which processes streaming PortForward/V2", | ||
StabilityLevel: metrics.ALPHA, | ||
}, | ||
[]string{statuscode}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only considers the backend status code, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily.
-
Within the
TunnelingHandler.ServeHTTP
there is oneBadRequest
response if the request does not contain subprotocols. In this case we return the 400-level response (and increment the metric) but do not consider either the backend/upstream SPDY upgrade status code or the WebSocket upgrade. -
Within the
InitializeWrite
, we consider the backend/upstream SPDY upgrade response. Only after we determine the SPDY upgrade succeeds, do we attempt the WebSocket upgrade. So the successful status code represents both the successful SPDY upgrade and the successful WebSocket upgrade. It is also possible in the situation where the SPDY upgrade was successful, but the WebSocket upgrade was not (see your next comment), where the metric is incremented with a 500-level error response.
Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking more about an end user or admin or SRE and how it will interpret this metrics, in case this metrics goes up ... where should they start looking , apiserver, kubelet, containerd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that if the non-successful metrics rise, investigators would initially look at the error statements in the apiserver logs. I think they would have to look at these logs no matter what. Please let me know if we should separate the SPDY from WebSocket status codes. The one place where the SPDY vs. WebSocket status code might be indeterminate is the TunnelingHandler.ServeHTTP
returning a BadRequest
status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, too complex to be able to express with metrics
/Lgtm
Thanks for the explanation
@@ -352,11 +357,13 @@ func (u *tunnelingWebsocketUpgraderConn) InitializeWrite(backendResponse *http.R | |||
} | |||
|
|||
klog.V(4).Infof("websocket connection created: %s", conn.Subprotocol()) | |||
metrics.IncStreamTunnelRequest(context.Background(), strconv.Itoa(http.StatusOK)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an exit on line 356, should this metric incremented there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes--you are correct. There should be a 500 internal server error metric incremented in this case. I have added it.
This case represents the situation where the SPDY upgrade was successful, but the subsequent WebSocket upgrade failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be recording StatusOK or StatusSwitchingProtocols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from StatusOK
(200) to StatusSwitchingProtocols
(101).
staging/src/k8s.io/apiserver/pkg/util/proxy/streamtunnel_test.go
Outdated
Show resolved
Hide resolved
overall lgtm, a question if we need to get metrics on all. possible return status and a nit on te tests |
763ad26
to
98ea294
Compare
/triage accepted |
98ea294
to
293a8db
Compare
LGTM label has been added. Git tree hash: f66a66695b6a6a78e7739a09adf1e67ff8c8f533
|
293a8db
to
90d70ed
Compare
/lgtm |
LGTM label has been added. Git tree hash: 20aba58079378cc7664283b61d747e0bdb3b5fe6
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, seans3 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
PortForward
tunneling through WebSockets:stream_tunnel_requests_total
.proxy
package:84.0%
ok k8s.io/apiserver/pkg/util/proxy 2.921s coverage: 84.0% of statements
/kind cleanup