-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race between periodic sync and delete event. #26167
Conversation
Controller should ignore "Sync" event on deleted objects. When controller receives 'object deleted' event from etcd and at the same time it starts periodic sync, "Sync" event can be enqueued after "Delete" event. Processing "Deleted" event, controller removes the object from its cache. It should not create the object there when processing "Sync" event after that.
@lavalamp, PTAL |
GCE e2e build/test failed for commit a1feee8. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
@adohe, resync happens when |
Notice that Delete does not remove the object being deleted from |
oops, unit test did not pass, I'll look at it tomorrow. Anyway, you can fix it by yourself, the PR here is just to show what's probably wrong. |
@jsafrane I think the |
Thanks for the PR-- I need to think about this carefully to understand. I'm swamped, it may take a bit before I have a chance. |
@jsafrane I think this PR can fix the race condition you describe. I like this fix. 👍 |
The tests are failing because the initial sync is done as Sync events which are ignored because of this PR. I think the real fix will be few lines changed here and there, however I am pretty swamped too :to investigate how these reflectors and informers and queues work together. |
Also I will take a look at the failed test. |
How is it possible? |
Resync and Relist are separated now. Resync is from the local store, so the race described is possible. This problem and the #27004 are exposed by the separation of resync and relist. |
I see. |
@hongchaodeng perhaps the cure for this issue and #27004 is to wait for the queue to be drained before reading the store (i.e., before the resyncing in this issue, and before checking if an object exists in store in #27004). |
@lavalamp is this something we want for 1.3? |
Yes, if the list/sync split exacerbated this then we should fix that. On Wed, Jun 8, 2016 at 11:35 AM, Saad Ali notifications@github.com wrote:
|
I'm marking it as 'do not merge', my fix is wrong. I'm leaving it open just for documentation and search purposes, feel free to close this PR when you have a better one. |
I am closing the pr per author's comment above. Please document this as known issue in #27076. |
Controller should ignore "Sync" event on deleted objects. When controller receives 'object deleted' event from etcd and at the same time it starts periodic sync, "Sync" event can be enqueued after "Delete" event. Processing "Deleted" event, controller removes the object from its cache. It should not create the object there when processing "Sync" event after that.
Fixes #26082
Please review carefully, I'm not sure if it's the right thing to do - this must have been tested to death, everybody uses
NewInformer()
...Also, an unit test would be nice, however all controller tests are in
framework_test
package and I can't callcontroller.config.Queue.Resync()
from there to simulate periodic resync (controller.config
is not exported). Is there a reliable way to wait for periodic sync in aAddFunc
callback? I could sleep, but that's flaky...