Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored the fluentd-es addon #50082

Merged

Conversation

crassirostris
Copy link

@crassirostris crassirostris commented Aug 3, 2017

Refactor fluentd-elasticsearch addon:

  • Decrease the number of files by moving RBAC-related objects in the same files where they're used
  • Move the fluentd configuration out of the image
  • Don't use PVC to avoid leaking resources in e2e tests
  • Fluentd now ingest docker and kubelet logs that are written to journald
  • Disable X-Pack, because it's not free

Fixes #41462
Fixes #49816
Fixes #48973
Fixes #49450

@aknuds1 @coffeepac Could you please take a look?

Fluentd DaemonSet in the fluentd-elasticsearch addon is configured via ConfigMap and includes journald plugin
Elasticsearch StatefulSet in the fluentd-elasticsearch addon uses local storage instead of PVC by default

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 3, 2017
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Aug 3, 2017
@crassirostris crassirostris force-pushed the cleanup-fluentd-es branch 2 times, most recently from cd11fb7 to 1f32693 Compare August 3, 2017 12:50
@crassirostris crassirostris changed the title WIP: Refactored the fluentd-es addon Refactored the fluentd-es addon Aug 3, 2017
@crassirostris crassirostris added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Aug 3, 2017
@crassirostris crassirostris requested a review from coffeepac August 3, 2017 12:55
@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Aug 3, 2017
@aknuds1
Copy link
Contributor

aknuds1 commented Aug 3, 2017

@crassirostris Thanks, will have a look!

@crassirostris crassirostris force-pushed the cleanup-fluentd-es branch 2 times, most recently from 2568ce9 to b1f39fa Compare August 3, 2017 13:36

The Elasticsearch StatefulSet will use the [EmptyDir][emptyDir] volume to
store data. This volume will be erased with the pod evictions, it is used
just for the testing purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say 'just for testing purposes'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

that by default will be 100 Gi per replica. Please adjust this to your needs (including
possibly choosing a more suitable StorageClass).

The Elasticsearch StatefulSet will use the [EmptyDir][emptyDir] volume to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called emptyDir rather than EmptyDir?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's referred as EmptyDir in the API docs (e.g. in https://kubernetes.io/docs/api-reference/v1.7/#emptydirvolumesource-v1-core)

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/cluster/addons/fluentd-elasticsearch/README.md?pixel)]()
[fluentd]: http://www.fluentd.org/
[elasticsearch]: http://www.elasticsearch.org/
[kibana]: https://www.elastic.co/products/elasticsearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's funny, I actually noticed that, but it kinda fell in the crack :D

Ack


[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/cluster/addons/fluentd-elasticsearch/README.md?pixel)]()
[fluentd]: http://www.fluentd.org/
[elasticsearch]: http://www.elasticsearch.org/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@@ -12,3 +12,6 @@ path.data: /data
network.host: 0.0.0.0

discovery.zen.minimum_master_nodes: ${MINIMUM_MASTER_NODES}

xpack.security.enabled: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

resources:
limits:
memory: 200Mi
memory: 500Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you determine the new value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically. Note that it's limit, I haven't changed the request. that's just to protect the fluentd from ruby having memory spikes

@@ -64,3 +102,10 @@ spec:
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
# It is needed to copy systemd library to decompress journals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

gem 'fluent-plugin-kubernetes_metadata_filter', '~>0.27.0'
gem 'fluent-plugin-elasticsearch', '~>1.9.5'
gem 'fluent-plugin-systemd', '~>0.0.8'
gem 'fluent-plugin-prometheus', '~>0.3.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


[fluentd]: http://www.fluentd.org/
[elasticsearch]: http://www.elasticsearch.org/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

value: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
- name: XPACK_MONITORING_ENABLED
value: false
- name: XPACK_SECURITY_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@crassirostris
Copy link
Author

@aknuds1 Thanks! I'll address it shortly

@crassirostris crassirostris force-pushed the cleanup-fluentd-es branch 3 times, most recently from d327e29 to 68da746 Compare August 3, 2017 15:18
@crassirostris
Copy link
Author

/retest

2 similar comments
@crassirostris
Copy link
Author

/retest

@crassirostris
Copy link
Author

/retest

spec:
template:
metadata:
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
version: v1.24
version: v0.2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this version going backwards?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to move to the semver, but I agree that moving 1.24 -> 0.1.24 and then updating to 0.2.0 is less intuitive than 1.24 -> 1.24.0 -> 2.0.0

I'll change the version, thanks for noticing!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@coffeepac
Copy link
Contributor

this fixes: #41462

@crassirostris
Copy link
Author

Thanks for reminding!

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2017
@crassirostris crassirostris force-pushed the cleanup-fluentd-es branch 2 times, most recently from 5854152 to 8dd80fa Compare August 4, 2017 09:34
[X-Pack plugin][xPack]. See configuration parameter `xpack.security.enabled`
in Elasticsearch and Kibana configurations. It can also be set via
`XPACK_SECURITY_ENABLED` env variable. You can then use [ConfigMaps][configMap]
and [Secrets][secret] to set up password in Kibana and fluentd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To set up credentials (rather than 'password')?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

possibly choosing a more suitable StorageClass).

The Elasticsearch StatefulSet will use the [EmptyDir][emptyDir] volume to
store data. This volume will be erased with the pod evictions, it is used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when pods terminate" maybe instead of "with the pod evictions"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

store data. This volume will be erased with the pod evictions, it is used
just for testing purposes.

**Note:** please change the storage to the persistent volume claim if you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say "change the storage to persistent volume claim" instead of "change the storage to the persistent volume claim"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


Learn more at: https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana
**Note:** in order for Fluentd to work, every Kubernetes node must be labeled
with `beta.kubernetes.io/fluentd-ds-ready=true`, as otherwise Fluentd DaemonSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "the Fluentd DaemonSet".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@aknuds1
Copy link
Contributor

aknuds1 commented Aug 4, 2017

@crassirostris Should you document perhaps how the Fluentd configuration must be changed in order to use credentials towards Elasticsearch?

Copy link
Author

@crassirostris crassirostris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you document perhaps how the Fluentd configuration must be changed in order to use credentials towards Elasticsearch

Ack, will address everything shortly

[X-Pack plugin][xPack]. See configuration parameter `xpack.security.enabled`
in Elasticsearch and Kibana configurations. It can also be set via
`XPACK_SECURITY_ENABLED` env variable. You can then use [ConfigMaps][configMap]
and [Secrets][secret] to set up password in Kibana and fluentd.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

possibly choosing a more suitable StorageClass).

The Elasticsearch StatefulSet will use the [EmptyDir][emptyDir] volume to
store data. This volume will be erased with the pod evictions, it is used
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

store data. This volume will be erased with the pod evictions, it is used
just for testing purposes.

**Note:** please change the storage to the persistent volume claim if you
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


Learn more at: https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana
**Note:** in order for Fluentd to work, every Kubernetes node must be labeled
with `beta.kubernetes.io/fluentd-ds-ready=true`, as otherwise Fluentd DaemonSet
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@crassirostris
Copy link
Author

@aknuds1 Done, PTAL

@aknuds1
Copy link
Contributor

aknuds1 commented Aug 4, 2017

@crassirostris On it


Since Fluentd talks to Elastcisearch service inside the cluster, instance on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to word it "Since Fluentd talks to the Elasticsearch service inside the cluster, instances on masters won't work, because masters have no kube-proxy. Add a taint on your masters to avoid Fluentd pod scheduling there".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


Since Fluentd talks to Elastcisearch service inside the cluster, instance on
the master won't work, because there's no kube-proxy on master. Add taint on
your master to avoid Fluentd pod scheduling there.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to taint masters, since the label beta.kubernetes.io/fluentd-ds-ready=true is already used to control the nodes on which Fluentd is scheduled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense actually. It's the problem of GCE setup in this repo, not a generic one. I'll rephrase

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-es-v1.24
name: fluentd-es-v2.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any point in appending the version to the name? We stopped doing this for Elasticsearch itself after all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is none, thanks for noticing

Copy link
Author

@crassirostris crassirostris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your thorough review!

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-es-v1.24
name: fluentd-es-v2.0.0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is none, thanks for noticing


Since Fluentd talks to Elastcisearch service inside the cluster, instance on
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


Since Fluentd talks to Elastcisearch service inside the cluster, instance on
the master won't work, because there's no kube-proxy on master. Add taint on
your master to avoid Fluentd pod scheduling there.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense actually. It's the problem of GCE setup in this repo, not a generic one. I'll rephrase

@coffeepac
Copy link
Contributor

@aknuds1 this looks good to me, but I'll wait for your approval before I put a lgtm label on it

possibly choosing a more suitable StorageClass).

The Elasticsearch StatefulSet will use the [EmptyDir][emptyDir] volume to
store data. This volume will be erased when the pod terminates, it is used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds more correct, as there will be several pods: "when the pods terminate".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say that an EmptyDir volume is erased if the pod it belongs to is restarted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crassirostris Aha, when reading through again it makes sense in context.

just for testing purposes.

**Note:** please change the storage to persistent volume claim if you want
to use this deployment in your setup!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a deployment though, maybe call it 'StatefulSet'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Since Fluentd talks to the Elasticsearch service inside the cluster, instances
on masters won't work, because masters have no kube-proxy. Don't mark masters
with a label mentioned in the previous paragraph or add a taint on them to
avoid Fluentd pods scheduling there.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

[configMap]: https://kubernetes.io/docs/tasks/configure-pod-container/configmap/
[secret]: https://kubernetes.io/docs/concepts/configuration/secret/
[statefulSet]: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset
[initContianer]: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initContainer, not initContianer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for catching this!

@crassirostris
Copy link
Author

@aknuds1 Thanks! Rephrased Storage paragraph little bit

@crassirostris
Copy link
Author

/retest

@kubernetes kubernetes deleted a comment from k8s-github-robot Aug 4, 2017
@coffeepac
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 5, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coffeepac, crassirostris

Associated issue: 41462

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 48532, 50054, 50082)

@k8s-github-robot k8s-github-robot merged commit 70b4db2 into kubernetes:master Aug 5, 2017
@piosz piosz removed their assignment Aug 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants