-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event aggregation: include latest event message in aggregate event #46034
Event aggregation: include latest event message in aggregate event #46034
Conversation
Hi @kensimon. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
f206ff0
to
482c09d
Compare
/unassign |
i like the goal of not losing the source message due to ttl concerns. if you can add a test case as this evolves, it would help me know if this will not cause painful event spam issues to recur (which was extremely painful 2 yrs ago). |
Yeah I'll definitely add the test case to prove the point a bit. I've tested this manually with local-up-cluster and deployed a broken container, and it would create the first 10 events as independent events, but after the 10th event, the first event would be modified (the count and lastseen bumped, and the message string changed to the latest event's string). The only thing that's a bit confusing is that there's no clear sign that an event is being re-used as a canonical event. As an example, 15 events come in that are the same Reason but different Messages... For the first 10 you would see:
and when the 11th event comes in, you'll see:
And if the other events fall off from TTL (I simulated this by setting the event-ttl to 60s), and 4 more events come in, you'll eventually only see:
which is a bit confusing, since the event actually happened a total of 15 times, but there's 9 events that expired off and only the first one was re-used. I think it's not worse than we have now, but it's still not ideal. |
i would prefer a new event, and a message prefix that says something like "combined from similar : last-message known here" |
482c09d
to
c83e261
Compare
@derekwaynecarr See update, this now preserves the behavior of still creating an aggregate event, but keeping its message up-to-date. The PR is now quite a bit simpler, and the tests basically work as-is (all I had to do was change the I'm accomplishing this by having the Since This should now be ok to test (PR subject has been updated) |
c83e261
to
b7c6a6d
Compare
Removed some dead code from when the PR was more complex, should be a much simpler change now. |
@derekwaynecarr poke |
@pmorie @derekwaynecarr PTAL. |
@k8s-bot test this #IGNORE |
This changes the event aggregation behavior so that, when multiple events are deduplicated, the aggregated event includes the message of the latest related event. This fixes an issue where the original event expires due to TTL, and the aggregate event doesn't contain any useful message.
b7c6a6d
to
6ada269
Compare
@k8s-bot ok to test |
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this I've seen these failures before, not sure if it's issues with the environment or just flakes... |
Thanks! /lgtm |
hmm, seems i cant approve this. i did verify the behavior works as expected on a run-away controller, and the new experience is preferred. /assign deads2k |
/assign @deads2k |
build failing due to gcloud's api quota getting exceeded, I'll try and trigger a re-test once things cool down a bit |
/approve |
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, derekwaynecarr, kensimon The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@deads2k is this going to get merged for 1.7, or? |
yeah, its in queue https://submit-queue.k8s.io/#/queue |
Automatic merge from submit-queue (batch tested with PRs 46726, 41912, 46695, 46034, 46551) |
What this PR does / why we need it:
This changes the event aggregation behavior so that, when multiple events are deduplicated, the aggregated event includes the message of the latest related event.
This fixes an issue where the original event expires due to TTL, and the aggregate event doesn't contain any useful message.
Which issue this PR fixes:
fixes #45971