Skip to content

Commit

Permalink
Merge pull request #27981 from a-robinson/journal-cvm
Browse files Browse the repository at this point in the history
Automatic merge from submit-queue

Support journal logs in fluentd-gcp on GCI

This maintains a single common image for each rather than having to fork out separate images, relying on different commands in yaml manifests to differentiate in the behavior. This is treading on top of @adityakali's #27906, but I wasn't able to get in touch with him this afternoon until very recently. He's handling making sure that the new yaml manifests are used when running on GCI.

```release-note
```
  • Loading branch information
k8s-merge-robot authored Jun 24, 2016
2 parents 8ed6c8e + 19bf9d0 commit 6aa016b
Show file tree
Hide file tree
Showing 7 changed files with 329 additions and 7 deletions.
17 changes: 12 additions & 5 deletions cluster/addons/fluentd-gcp/fluentd-gcp-image/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@
# Logging API. This configuration assumes that the host performning
# the collection is a VM that has been created with a logging.write
# scope and that the Logging API has been enabled for the project
# in the Google Developer Console.
# in the Google Developer Console.

FROM ubuntu:14.04
FROM ubuntu:16.04
MAINTAINER Alex Robinson "arob@google.com"

# Disable prompts from apt.
Expand All @@ -30,17 +30,24 @@ ENV DO_NOT_INSTALL_CATCH_ALL_CONFIG true

RUN apt-get -q update && \
apt-get install -y curl && \
apt-get install -y gcc && \
apt-get install -y make && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
curl -s https://dl.google.com/cloudagents/install-logging-agent.sh | bash

# Install the record reformer plugin.
RUN /usr/sbin/google-fluentd-gem install fluent-plugin-record-reformer
# Install the record reformer and systemd plugins.
RUN /usr/sbin/google-fluentd-gem install fluent-plugin-record-reformer -v 0.8.1
RUN /usr/sbin/google-fluentd-gem install fluent-plugin-systemd -v 0.0.3

# Remove the misleading log file that gets generated when the agent is installed
RUN rm -rf /var/log/google-fluentd

# Copy the Fluentd configuration file for logging Docker container logs.
# Copy the Fluentd configuration files for logging Docker container logs.
# Either configuration file can be used by specifying `-c <file>` as a command
# line argument.
COPY google-fluentd.conf /etc/google-fluentd/google-fluentd.conf
COPY google-fluentd-journal.conf /etc/google-fluentd/google-fluentd-journal.conf

# Start Fluentd to pick up our config that watches Docker container logs.
CMD /usr/sbin/google-fluentd "$FLUENTD_ARGS"
2 changes: 1 addition & 1 deletion cluster/addons/fluentd-gcp/fluentd-gcp-image/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

.PHONY: kbuild kpush

TAG = 1.20
TAG = 1.21

# Rules for building the test image for deployment to Dockerhub with user kubernetes.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# This configuration file for Fluentd / td-agent is used
# to watch changes to Docker log files that live in the
# directory /var/lib/docker/containers/ and are symbolically
# linked to from the /var/log directory using names that capture the
# pod name and container name. These logs are then submitted to
# Google Cloud Logging which assumes the installation of the cloud-logging plug-in.
#
# This configuration is almost identical to google-fluentd.conf, with the one
# difference being that this collects systemd journal logs.
#
# Example
# =======
# A line in the Docker log file might like like this JSON:
#
# {"log":"2014/09/25 21:15:03 Got request with path wombat\n",
# "stream":"stderr",
# "time":"2014-09-25T21:15:03.499185026Z"}
#
# The record reformer is used to write the tag to focus on the pod name
# and the Kubernetes container name. For example a Docker container's logs
# might be in the directory:
# /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
# and in the file:
# 997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
# where 997599971ee6... is the Docker ID of the running container.
# The Kubernetes kubelet makes a symbolic link to this file on the host machine
# in the /var/log/containers directory which includes the pod name and the Kubernetes
# container name:
# synthetic-logger-0.25lps-pod_default-synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# ->
# /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
# The /var/log directory on the host is mapped to the /var/log directory in the container
# running this instance of Fluentd and we end up collecting the file:
# /var/log/containers/synthetic-logger-0.25lps-pod_default-synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# This results in the tag:
# var.log.containers.synthetic-logger-0.25lps-pod_default-synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# The record reformer is used is discard the var.log.containers prefix and
# the Docker container ID suffix and "kubernetes." is pre-pended giving the
# final tag which is ingested into Elasticsearch:
# kubernetes.synthetic-logger-0.25lps-pod_default-synth-lgr
# This makes it easier for users to search for logs by pod name or by
# the name of the Kubernetes container regardless of how many times the
# Kubernetes pod has been restarted (resulting in a several Docker container IDs).

# Do not directly collect fluentd's own logs to avoid infinite loops.
<match fluent.**>
type null
</match>

# Example:
# {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}
<source>
type tail
format json
time_key time
path /var/log/containers/*.log
pos_file /var/log/gcp-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag reform.*
read_from_head true
</source>

<match reform.**>
type record_reformer
enable_ruby true
tag kubernetes.${tag_suffix[4].split('-')[0..-2].join('-')}
</match>

# Example:
# 2015-12-21 23:17:22,066 [salt.state ][INFO ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081
<source>
type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path /var/log/salt/minion
pos_file /var/log/gcp-salt.pos
tag salt
</source>

# Example:
# Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
<source>
type tail
format syslog
path /var/log/startupscript.log
pos_file /var/log/gcp-startupscript.log.pos
tag startupscript
</source>

# Examples:
# time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json"
# time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404
<source>
type tail
format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
time_format %Y-%m-%dT%H:%M:%S.%NZ
path /var/log/docker.log
pos_file /var/log/gcp-docker.log.pos
tag docker
</source>

# Example:
# 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal
<source>
type tail
# Not parsing this, because it doesn't have anything particularly useful to
# parse out of it (like severities).
format none
path /var/log/etcd.log
pos_file /var/log/gcp-etcd.log.pos
tag etcd
</source>

# Multi-line parsing is required for all the kube logs because very large log
# statements, such as those that include entire object bodies, get split into
# multiple lines by glog.

# Example:
# I0204 07:32:30.020537 3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537]
<source>
type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kubelet.log
pos_file /var/log/gcp-kubelet.log.pos
tag kubelet
</source>

# Example:
# I0204 07:00:19.604280 5 handlers.go:131] GET /api/v1/nodes: (1.624207ms) 200 [[kube-controller-manager/v1.1.3 (linux/amd64) kubernetes/6a81b50] 127.0.0.1:38266]
<source>
type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-apiserver.log
pos_file /var/log/gcp-kube-apiserver.log.pos
tag kube-apiserver
</source>

# Example:
# I0204 06:55:31.872680 5 servicecontroller.go:277] LB already exists and doesn't need update for service kube-system/kube-ui
<source>
type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-controller-manager.log
pos_file /var/log/gcp-kube-controller-manager.log.pos
tag kube-controller-manager
</source>

# Example:
# W0204 06:49:18.239674 7 reflector.go:245] pkg/scheduler/factory/factory.go:193: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [2578313/2577886]) [2579312]
<source>
type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kube-scheduler.log
pos_file /var/log/gcp-kube-scheduler.log.pos
tag kube-scheduler
</source>

# Example:
# I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf
<source>
type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/glbc.log
pos_file /var/log/gcp-glbc.log.pos
tag glbc
</source>

# Example:
# I0603 15:31:05.793605 6 cluster_manager.go:230] Reading config from path /etc/gce.conf
<source>
type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/cluster-autoscaler.log
pos_file /var/log/gcp-cluster-autoscaler.log.pos
tag cluster-autoscaler
</source>

# Logs from systemd-journal for interesting services.
<source>
type systemd
filters [{ "_SYSTEMD_UNIT": "docker.service" }]
pos_file /var/log/gcp-journald-docker.pos
read_from_head true
tag docker
</source>

<source>
type systemd
filters [{ "_SYSTEMD_UNIT": "kubelet.service" }]
pos_file /var/log/gcp-journald-kubelet.pos
read_from_head true
tag kubelet
</source>

# We use 2 output stanzas - one to handle the container logs and one to handle
# the node daemon logs, the latter of which explicitly sends its logs to the
# compute.googleapis.com service rather than container.googleapis.com to keep
# them separate since most users don't care about the node logs.
<match kubernetes.**>
type google_cloud
# Set the chunk limit conservatively to avoid exceeding the GCL limit
# of 10MiB per write request.
buffer_chunk_limit 2M
# Cap the combined memory usage of this buffer and the one below to
# 2MiB/chunk * (24 + 8) chunks = 64 MiB
buffer_queue_limit 24
# Never wait more than 5 seconds before flushing logs in the non-error case.
flush_interval 5s
# Never wait longer than 30 seconds between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
</match>

# Keep a smaller buffer here since these logs are less important than the user's
# container logs.
<match **>
type google_cloud
detect_subservice false
buffer_chunk_limit 2M
buffer_queue_limit 8
flush_interval 5s
max_retry_wait 30
disable_retry_limit
</match>
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
# pod name and container name. These logs are then submitted to
# Google Cloud Logging which assumes the installation of the cloud-logging plug-in.
#
# This configuration is almost identical to google-fluentd-journal.conf, with
# the one difference being that this doesn't try to collect systemd journal
# logs.
#
# Example
# =======
# A line in the Docker log file might like like this JSON:
Expand Down
7 changes: 7 additions & 0 deletions cluster/addons/gci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Some addons need to be configured slightly differently when running on the
Google ContainerVM Image (GCI). This directory serves as a place to store yaml
manifests that need to differ slightly from the ones under
`cluster/saltbase/salt`.


[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/cluster/addons/gci/README.md?pixel)]()
53 changes: 53 additions & 0 deletions cluster/addons/gci/fluentd-gcp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# This config should be kept as similar as possible to the one at
# cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml
apiVersion: v1
kind: Pod
metadata:
name: fluentd-cloud-logging
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
dnsPolicy: Default
containers:
- name: fluentd-cloud-logging
image: gcr.io/google_containers/fluentd-gcp:1.21
command:
- '/bin/sh'
- '-c'
# This is pretty hacky, but ruby relies on libsystemd's native code, and
# the ubuntu:16.04 libsystemd doesn't play nice with the journal on GCI
# hosts. Work around the problem by copying in the host's libsystemd.
- 'rm /lib/x86_64-linux-gnu/libsystemd* && cp /host/lib/libsystemd* /lib/x86_64-linux-gnu/ && /usr/sbin/google-fluentd -q -c /etc/google-fluentd/google-fluentd-journal.conf'
resources:
limits:
memory: 200Mi
requests:
# Any change here should be accompanied by a proportional change in CPU
# requests of other per-node add-ons (e.g. kube-proxy).
cpu: 80m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: journaldir
mountPath: /var/log/journal
- name: libsystemddir
mountPath: /host/lib
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: journaldir
hostPath:
path: /var/log/journal
- name: libsystemddir
hostPath:
path: /usr/lib64
4 changes: 3 additions & 1 deletion cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# This config should be kept as similar as possible to the one at
# cluster/addons/gci/fluentd-gcp.yaml
apiVersion: v1
kind: Pod
metadata:
Expand All @@ -9,7 +11,7 @@ spec:
dnsPolicy: Default
containers:
- name: fluentd-cloud-logging
image: gcr.io/google_containers/fluentd-gcp:1.20
image: gcr.io/google_containers/fluentd-gcp:1.21
resources:
limits:
memory: 200Mi
Expand Down

0 comments on commit 6aa016b

Please sign in to comment.