Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: add a generic pod lifecycle event generator #13571

Merged
merged 3 commits into from
Nov 16, 2015

Conversation

yujuhong
Copy link
Contributor

@yujuhong yujuhong commented Sep 4, 2015

This change introduces pod lifecycle event generator (PLEG), and adds a generic
PLEG. The generic PLEG relies on relisting to discover container events, and is
container-runtime-agnostic. Both docker and rkt are changed to use generic
PLEG.

@yujuhong yujuhong added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 4, 2015
@yujuhong yujuhong force-pushed the lifecycle_v0 branch 3 times, most recently from 1af11a8 to 28834f7 Compare September 4, 2015 00:41
@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 4, 2015
@k8s-github-robot
Copy link

Labelling this PR as size/L

@yujuhong
Copy link
Contributor Author

yujuhong commented Sep 4, 2015

This is part of #12802 to improve kubelet's performance. I decided to split #12829 in to a few parts for easier reviewing. The plan is to

  1. Add a generic PLEG for all container runtimes. Switch docker and rkt to use it.
  2. Add a docker PLEG which uses docker event stream. Switch docker to use it.

This PR does (1), and I am working on (2).

Other parallel tasks to reduce sync period includes:

  • Instruct pod workers to requeue themselves on sync error. I have a local patch for this.
  • Move probing out of sync loop (Add more probe control parameters #12866).
  • Handle pastActiveDeadline check. I will address this later.

What to expect from this PR:

  • kubelet should react to container termination events more quickly as the generic PLEG relists every ~3 seconds.
  • Temporarily rise in number of syncs because generic PLEG sends sync events even on container starts as well. At the same time, we still have the same 10s period sync period.
    • I don't expect this to affect the performance of kubelet too much, since the recent PR kubelet: trigger pod workers independently  #13003 has lowered the sync significantly.
    • This would get better when we add the docker PLEG and/or increase the sync period.

@yujuhong
Copy link
Contributor Author

yujuhong commented Sep 4, 2015

cc @kubernetes/goog-node

// Returns a channel from which the subscriber can recieve PodLifecycleEvent
// events.
func (g *GenericPLEG) Watch() chan *PodLifecycleEvent {
return g.eventChannel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a single channel, it means only a single component can be listening to the PLEs, right? It would be useful for the ProbeManager (work in progress by me) to be able to listen for the Container{Started,Stopped} events. What if each call to Watch created a new channel, which the "main" channel cloned events into? It would add a bit more complexity around the bookkeeping of closing channels & preventing channels from backing up though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had a TODO to extend this to multiple channels, but it's still a TODO :)
I think multi-channel support can wait a little bit, and help is always welcome.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2015
@yujuhong
Copy link
Contributor Author

yujuhong commented Sep 9, 2015

Rebased and a friendly ping.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2015
@yujuhong yujuhong force-pushed the lifecycle_v0 branch 4 times, most recently from 50ed15d to 17932c3 Compare September 14, 2015 23:52
@yujuhong yujuhong closed this Sep 15, 2015
@yujuhong yujuhong reopened this Sep 15, 2015
return f.PodList, f.Err
if all {
return f.AllPodList, f.Err
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 16, 2015
@yujuhong yujuhong force-pushed the lifecycle_v0 branch 4 times, most recently from 95d113b to e6c19ec Compare September 17, 2015 18:47
@k8s-bot
Copy link

k8s-bot commented Nov 14, 2015

GCE e2e build/test failed for commit ac778e8.

@wojtek-t
Copy link
Member

@k8s-bot test this please

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e build/test failed for commit ac778e8.

@wojtek-t
Copy link
Member

@k8s-bot test this please

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e test build/test passed for commit ac778e8.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e build/test failed for commit ac778e8.

@wojtek-t
Copy link
Member

@k8s-bot e2e test this please

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e build/test failed for commit ac778e8.

@yujuhong
Copy link
Contributor Author

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e test build/test passed for commit ac778e8.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e build/test failed for commit ac778e8.

@yujuhong
Copy link
Contributor Author

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e test build/test passed for commit ac778e8.

@yujuhong
Copy link
Contributor Author

FYI, I checked the failed test in build, where kubectl describe pod was not able to get the relevant events for the pod in question. However, the corresponding kubelet.log recorded all the events and pod status generation correctly. I couldn't find any suspicious message on the kubelet side.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Nov 16, 2015

GCE e2e test build/test passed for commit ac778e8.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

j3ffml added a commit that referenced this pull request Nov 16, 2015
kubelet: add a generic pod lifecycle event generator
@j3ffml j3ffml merged commit 70d89a3 into kubernetes:master Nov 16, 2015
@yujuhong yujuhong deleted the lifecycle_v0 branch January 21, 2016 20:44
@huang195
Copy link
Contributor

huang195 commented May 6, 2016

@yujuhong is there a PR on having kubelet using the docker event stream instead of polling?

@yujuhong
Copy link
Contributor Author

yujuhong commented May 6, 2016

@yujuhong is there a PR on having kubelet using the docker event stream instead of polling?

Nope. It wasn't prioritized for v1.3, so no one is actively working on it. Also, it'd need most likely more than one PR:

  • SyncPod does too many things, so we'll need to ensure the expected events are properly recorded
  • rkt doesn't support event stream yet, so this will only work for docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.