-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestClientGoCustomResourceExample flakes #49956
Comments
these errors started occurring right after 74b9ba3 was merged https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8014/ https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8023/ |
Could this be interactions with the aggregator cache? |
Is that bad? |
no, |
with lots of tracing added, I can reproduce: controller lister/watcher:
client:
cacher lister/watcher:
processEvent is never called and the watcher is never sent the 409 watch event
|
Automatic merge from submit-queue (batch tested with PRs 49992, 48861, 49267, 49356, 49886) Correctly handle empty watch event cache Fixes #49956 Introduced by ada6023 which did not adjust the oldest available resourceVersion for an empty watch event cache. Exposed by 74b9ba3, which allowed controllers to get list results from etcd before the watch cache is ready (normally they list with resourceVersion=0 which serves the list request from the watch cache, blocking until it is ready) When the watch cache had an empty cache of watch events, it currently allows establishing a watch as if it can deliver a watch event for its currently synced resourceVersion. This results in an off-by-one error which can result in a missed watch event. Scenario: bob: 1. creates object at resourceVersion=11 sally: 1. does a list API request, gets a list resourceVersion of 10 (just before bob creates the object) 2. starts watch handled by watch cache at resourceVersion=10 Watch cache: 1. initial list gets resourceVersion=11, including the item created by bob 2. when determining the initial watch events to send to sally's watch, there are no watch events in the cache, so no initial watch events are sent. 3. the cache listerwatcher watches etcd starting at resourceVersion=11, so future events are fed into the event cache and to sally's watch The watch cache should have dropped sally's watch from resourceVersion=10 with a "gone" error, since it can't deliver the watch event for resourceVersion=11. This would force sally to relist (where she would get a list at resourceVersion=11) and rewatch (from resourceVersion=11) This particularly affects tests that create CRD/TPRs and establish watches on the new types as the storage layer's watch cache is also populating for that type. ```release-note Fix a bug in watch cache sometimes causing missing events after watch cache initialization. ```
flaking on HEAD:
https://storage.googleapis.com/k8s-gubernator/triage/index.html?pr=1&job=ci-kubernetes-test-go&test=TestClientGoCustomResourceExample
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8135#k8siokubernetesvendork8sioapiextensions-apiservertestintegration-testclientgocustomresourceexample
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8118#k8siokubernetesvendork8sioapiextensions-apiservertestintegration-testclientgocustomresourceexample
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8105#k8siokubernetesvendork8sioapiextensions-apiservertestintegration-testclientgocustomresourceexample
@kubernetes/sig-api-machinery-test-failures
The text was updated successfully, but these errors were encountered: