-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky test] EndpointSlice should create Endpoints and EndpointSlices for Pods matching a Service #93374
Comments
needs investigation since we're switching kube-proxy to endpointslice in this release |
/triage unresolved Comment 🤖 I am a bot run by vllry. 👩🔬 |
made me think of #91399, but that's in already. |
It's possible it's a test timeout that's just too close, but we need verification of that checking the controller and apiserver logs to make sure there isn't a latent bug |
Thanks for catching this! It looks like the timeouts are actually associated with waiting for 2 pods to be ready, not actually anything related to EndpointSlice functionality - timeout is here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/network/endpointslice.go#L261-L285. Given that, it seems like the obvious solution would be to update the Services to set |
/remove-triage unresolved |
If that is still exercises the same conditions (I vaguely remember publishNotReadyAddresses changing code paths in the endpoints controller... does it do the same in the endpointslice controller?), that seems reasonable to me. |
I think the code path is very similar here, the only real difference should be how the EndpointSlice controller calculates readiness. I've gone ahead and filed a PR that uses this approach, hopefully it helps: #93402 |
This flaked again this morning /reopen |
@hasheddan: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This should be |
|
The top level error message does not make this clear but the issue here is that too many EndpointSlices are being created - 3 instead of 2. |
Interesting... that condition should not persist for O(minutes), right? that seems like it might be a bug, not just a test issue |
There's potential that it's not consistently merging 2 ports with 1 endpoint into a single slice and instead creating a slice per port. It also could represent a larger bug. Trying to find a way to consistently reproduce this. |
Despite my best efforts I have been unable to reproduce this on my own e2e cluster. I've had 10 test runs going at the same time repeatedly without any luck, any ideas for how to reproduce flaky test failures local would be welcome. In the meantime, I've created a follow up PR that will provide more helpful information if/when this test fails again: #93491. My leading theory right now is that this could be a caching issue. The EndpointSlice controller runs a sync, sees that we don't have an EndpointSlice for the given port combination in cache and then goes ahead and creates one, even though a matching EndpointSlice already exists and has not been cached yet. That would result in an annoying situation where an essentially duplicate EndpointSlice was created due to a stale cache. I would think the EndpointSliceTracker would resolve this issue for us though, so not sure. Getting the above PR in will help validate this theory. It's worth noting that if this is actually a duplicate EndpointSlice, clients like kube-proxy will handle this seamlessly because they already have to handle potential duplication in EndpointSlices. That would potentially explain why a test failure like this does not seem to be causing other things to break. |
It appears that my theory was correct as witnessed here: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/93491/pull-kubernetes-e2e-gce/1287970453826048000/. I create a gist with what the 3 EndpointSlices look like. The key points I've noticed so far:
Of course this is something the EndpointSliceTracker should be handling for us, but even that approach seems less than ideal here. It wouldn't actually prevent the duplicate EndpointSlices from being created, it should just be cleaning up duplicates after they happen. This combined with the race conditions referenced in #92928 make me think we may want to find a way to slow down the sync cycle for each Service here. This also feels at least tangentially related to the need for better rate limiting in both this and the Endpoints controller discussed in #88967. |
Hmm - actually I don't understand why this is happening. Maybe I'm missing something, but I don't think slowing down controller is needed to solve this problem. Even though the controller usually is not lagging my more than 1s, it is possible that it can be lagging by a minute or more in pathological cases. So we can't just rely on the fact that it's guaranteed to be synced. Can you explain it a bit deeper why this is happening? i.e. some more detailed timeline like:
|
Agree, I'm not in favor of slowing down the controller globally to address this. Delaying resync of a particular service until we observe the existence of new endpointslices we know we created (with some max delay) could be reasonable. |
That's exactly where I was hoping to end up here, although I'm not quite sure how to do that cleanly yet. To be clear, this is not just to fix this flaky test. This flaky test is a symptom of a much larger and more annoying issue with the current implementation. The EndpointSlice controller is consistently running into errors where it's trying to update EndpointSlices that are stale. This is caused by follow up calls to In this specific instance, it looks like a syncService run is occurring after the first 2 EndpointSlices have been created but before the one referencing pod1 is in the informer cache. That results in the controller creating a new EndpointSlice to reference pod1 and thus we end up with a duplicate. If anything in the future triggers another syncService call that would be resolved. I'd initially thought that EndpointSliceTracker would be thing to trigger that follow up syncService, but I was wrong. That tracker just tracks the latest resource version for each EndpointSlice the controller has created/updated and compares that with EndpointSlices received through create/update events. If the controller doesn't think that EndpointSlice should exist or the controller's resource version for that slice does not match it triggers another syncService. In this case neither of those conditions would be met so no additional syncService call would be triggered. So that's a long way of saying that I think our best possible solution here is to introduce some delay between when Services can be synced. I think the easiest solution that would solve 99% of our problems would be to add something like a 1s delay in between syncs simply by delaying when With that said, I'm hopeful that there's a clean way to implement a solution that actually waited for each EndpointSlice change the controller has made to be reflected in the cache because that would be a better solution. I'll spend some time looking into possible implementations here. |
I've created a draft PR that contains my initial progress towards the approach I described above. I don't particularly love this approach so I'm very open to other ways we could solve this issue. It's also still very much a WIP and has significant work remaining. I opened the PR early just so I could get some initial feedback on the direction and potentially inspire other ideas around approaches we could take here. #93520 |
#93520 is still the best approach I can come up with here. It is ready for review if anyone has time to take a look. |
Quick update here, I've limited the scope here to the following changes as discussed in #93520 (comment):
Reviews would be appreciated on all of the above. |
Reopening since #93908 is not yet merged. |
@robscott: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Which jobs are flaking:
Which test(s) are flaking:
Second flakiest test in the e2e job according to http://storage.googleapis.com/k8s-metrics/flakes-latest.json
Testgrid link:
https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-e2e-gce&include-filter-by-regex=EndpointSlice%20should%20create%20Endpoints%20and%20EndpointSlices%20for%20Pods%20matching%20a%20Service&width=5
Reason for failure:
Anything else we need to know:
https://storage.googleapis.com/k8s-gubernator/triage/index.html?pr=1&test=EndpointSlice%20should%20create%20Endpoints%20and%20EndpointSlices%20for%20Pods%20matching%20a%20Service
/sig network
/assign @robscott
The text was updated successfully, but these errors were encountered: