Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There should be a single way that pods and other resources are identified in k8s component logs wherever possible #23338

Closed
pmorie opened this issue Mar 22, 2016 · 17 comments
Labels
area/logging area/test-infra lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.

Comments

@pmorie
Copy link
Member

pmorie commented Mar 22, 2016

Currently there is no single way that pods are identified in log messages, which can make it difficult to find all the messages relevant to a single pod in a log file. I think, wherever possible, that we should identify pods with the following information, formatted consistently:

  1. Pod name
  2. Pod namespace
  3. Pod UID

It's possible that not all call sites will have all of this information; in that case, we should stick to the format, and just log empty fields. I started thinking about this in the context of the kubelet but it applies to anything that logs messages identifying pods.

@kubernetes/sig-node @kubernetes/rh-cluster-infra

@yujuhong
Copy link
Contributor

I've been trying to push this with a helper function that prints podName_podNamespace(podUID) for any api.Pod object, but not all call sites have been converted. In some cases, there may not be pod-level information (e.g., only container names/IDs), I am not sure if printing empty fields will help(?)

@maisem maisem added team/cluster sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Mar 22, 2016
@timothysc
Copy link
Member

A dream of mine in every distributed system would be complete object logging for total life-cycle tracability (admission-scheduling-execution). Maybe it's just a dream.

@derekwaynecarr
Copy link
Member

OMG yes, this would be extremely useful. This goes for kubecontainer.Pod
as well.

On Sun, Mar 27, 2016 at 9:31 PM, Timothy St. Clair <notifications@github.com

wrote:

A dream of mine in every distributed system would be complete object
logging for total life-cycle tracability (admission-scheduling-execution).
Maybe it's just a dream.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub
#23338 (comment)

@yujuhong
Copy link
Contributor

I agree in general, and that's also why I added the helper function to begin with. However, supporting this everywhere is a bit problematic because not every function will have access to the entire api.Pod object for example, or even the pod UID. E.g., I think it's acceptable that the dockertools package logs a single line about "deleting a container" without writing down the UID. Of course, there should be corresponding higher-level message about deleting containers in a pod logged somewhere else. I'd like to think that we have a consistent way to log objects whenever possible, but not every log message can be associated back with a high-level object easily. For relevant information to track the api object, we should (perhaps) rely on Events.

@timstclair
Copy link

Can we generalize this to standardize logging across the whole project? What if we abstracted away from glog to create our own higher-level logging package? I am envisioning an API that would enable something like:

log.Msg("could not read pod info"). // log message
    ForObject(pod).                 // associate with API object (log identity, best effort)
    WithError(err).                 // include an error
    Error()                         // write the log (error level)

or

log.Msg("syncing pod").  // log message
    FullObject(pod).     // pretty-print full object spec (deep print)
    From(podWorker).     // identify call owner
    Label("first time", isFirst)  // attach extra information
    Debug()              // log at debug level

The goal would be to identify the common pieces of information that are logged and provide them in a consistent, human & machine readable format.

WDYT?

@timothysc
Copy link
Member

I like wrapping logging, I'm not sold on the format above vs.

klog.Infof
klog.Debugf
klog.Warningf
klog.Errorf

If you wanted some object formatter interface above, that may make sense too.

/cc @jayunit100

@timstclair
Copy link

Something to add: a problem with just solving this in a logging library is we often want to write this data to error objects, and don't necessarily log it directly. We could introduce a custom error type with fields to attach all the same metadata to it (object reference, full object, another error, labels, etc.), and then make the logger dissect the object and format it consistently.

@timothysc - I think an advantage to my proposal over Infof style logging is the output could be more structured, and hence more machine readable, so we could build custom tools for searching & filtering the logs. E.g. Here are the logs from a kubelet, the api-server, and the master, combine them and show me all the entries relating to pod "heapster". Or better yet, just connect to my cluster and show me all the logs for XYZ.

kubectl debug pod heapster-v1.0.0-vuanf

@vishh
Copy link
Contributor

vishh commented Apr 25, 2016

FYI: https://github.com/Sirupsen/logrus is an attempt to generate structures logs for golang.

@timothysc
Copy link
Member

timothysc commented Apr 25, 2016

@timstclair I don't think there is anything that precludes us from having a set of higher level capabilities, that back into the lower level utilities.

Here is where I miss my macro-overloading and template pasting magic of C++.

@ncdc
Copy link
Member

ncdc commented Apr 26, 2016

See also #6461 #17162 #17449

@timothysc
Copy link
Member

After some time I kind of like https://github.com/Sirupsen/logrus ...

@bgrant0607 bgrant0607 changed the title There should be a single way that pods are identified in logs wherever possible There should be a single way that pods and other resources are identified in k8s component logs wherever possible May 9, 2016
@bgrant0607 bgrant0607 added area/test-infra sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. labels May 9, 2016
@bgrant0607
Copy link
Member

I am very much in favor of making our system component logs more useful/actionable, consistent, structured, etc.

I'd be in favor of treating errors as failures in tests.

I am also in favor of leveraging third-party libraries and tools.

I am not in favor of adding functionality to kubectl for digging into component logs. Our CLI and API surface areas are big enough already, and this really sounds like a job for Elasticsearch/Kibana.

If there are conditions users care about, they should be surfaced via the API, such as using events or conditions.

If there are occurrences that cluster admins may care about, they could be surfaced via events, node conditions, or exported monitoring metrics.

See also #3500 and #20672 re. just dumping API objects for debugging.

@timstclair
Copy link

I agree with all these points, and thanks for links re: dumping API objects. I agree that we should leverage existing libraries and tools, but I think we should also build kubernetes-specific abstractions around them. Our elasticsearch / kibana / GCM solutions work for running clusters, but it would also be good to have a way of ingesting logs from a debug dump or a past jenkins e2e run (e.g. import the logs from this storage bucket). Good points about events & conditions vs logs, we should largely view them as internal debugging tools.

@fejta-bot
Copy link

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2017
@yujuhong yujuhong removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2018
@k8s-ci-robot k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 14, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging area/test-infra lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience.
Projects
None yet
Development

No branches or pull requests