Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle innocuous error messages during control plane startup #2168

Closed
grampelberg opened this issue Jan 29, 2019 · 7 comments
Closed

Handle innocuous error messages during control plane startup #2168

grampelberg opened this issue Jan 29, 2019 · 7 comments
Assignees

Comments

@grampelberg
Copy link
Contributor

grampelberg commented Jan 29, 2019

What problem are you trying to solve?

When the controller containers start up, they log a lot of connection refused lines. This is a timing issue between the proxy and the controller container. As the proxy is not yet ready to receive traffic, the connections get retried.

Unfortunately, this scares users and can provide misleading information around debugging. In the past, it was not possible to opt-out of these lines because of glog.

How should the problem be solved?

The underlying problem here is that a Sync is kicked off before the proxy is ready to forward traffic. This feels like a general linkerd problem, and possibly something that could help our users out.

I would like to see a way for sidecars to know when they can start sending traffic out. It feels like this could be a readiness probe for the proxy. Then, the controller code could wait for the probe to pass and continue starting.

Alternatives considered

  • Mute klog - there are valid messages that will just make debugging more difficult if klog is completely muted.
  • Toggle klog around sync - feels pretty magical and fragile.
@grampelberg
Copy link
Contributor Author

@klingerf should the klog setting be -v3 or just ignore that for now?

@siggy
Copy link
Member

siggy commented Jan 29, 2019

Note that the connection refused lines are at the error level. Error-level logging from client-go may be valuable after successful startup. If possible we should consider dynamically changing the log-level once the server is fully initialized.

For reference, here's an example error we see at startup:

E0129 01:20:44.847042       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.DaemonSet: Get https://10.96.0.1:443/apis/apps/v1/daemonsets?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
...
time="2019-01-29T01:20:46Z" level=info msg="caches synced"

@klingerf
Copy link
Contributor

@grampelberg If --controller-log-level is set to debug, we might as well set the klog log level to as verbose is possible.

This was referenced Jan 29, 2019
@grampelberg grampelberg changed the title Update client-go logging to hide innocuous error messages Handle innocuous error messages during control plane startup Jan 29, 2019
@grampelberg
Copy link
Contributor Author

Updated to reflect the problem statement and a more sustainable solution.

@grampelberg grampelberg added the priority/P1 Planned for Release label Jan 29, 2019
@dadjeibaah
Copy link
Contributor

For this, what about just hitting the pods /metrics endpoint exposed by the proxy and once that gives us a success response proceed with the Sync. That endpoint is what's used for the readinessProbes for the kubelet. We could use it in each of the control plane components to effectively do the same thing the kubelet readiness checks are doing.

@klingerf
Copy link
Contributor

@dadjeibaah That makes sense to me, especially since that's what we're already using for the proxy's readiness checks. I wanted to weigh in with a bit of additional info though.

I think this issue is more complicated to fix than it seems on the surface. The easiest fix is to suppress these logs messages entirely, unless --controller-log-level is set to "debug". That's my preferred fix.

Alternatively, if we're worried about debuggability when the logs are suppressed, then we can introduce some sort of delay in the controller components that waits for their sidecar proxies to initialize. Frankly that feels a bit anti-linkerd to me. The whole idea behind proxy injection is that your applications should be unaware that they're running with a sidecar, and in my mind, that includes the control plane.

Fwiw, other projects have hit this issue too, and there's been some push for container startup/shutdown ordering supported natively by Kubernetes. The original issue for that was tracked in kubernetes/kubernetes#65502, and it has now been converted to a KEP, here:

https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/sidecarcontainers.md

I'm not sure what happens now, but assumedly there will be some support for this in future versions of Kubernetes, and my recommendation would be to wait for that instead of applying application-level fixes at this time.

@siggy siggy added priority/P0 Release Blocker and removed priority/P1 Planned for Release labels Feb 4, 2019
dadjeibaah pushed a commit that referenced this issue Feb 7, 2019
…2221)

The controller logs innocuous messages when control plane proxies aren't ready to route requests during startup from each control plane component. i.e. tap, public-api and proxy-api. Setting the log level in the control plane to `INFO` would not hide these log messages and would still show up on control plane startup.

This PR modifies `klogs` initial flag set to route innocuous logs to `/dev/null` if the controller log level is set to INFO. If set to debug, we output all loglines to stderr.

Fixes #2171 #2168
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
@siggy
Copy link
Member

siggy commented Feb 7, 2019

Closed by #2221.

@siggy siggy closed this as completed Feb 7, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants