-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit flake in pkg/storage: TestFiltering #24194
Comments
cc @wojtek-t |
Will take a look |
cc @liggitt |
I'm guessing that #24008 has something to do with this |
@ncdc - I have a different hypothesis, let me verify |
Yup - the reason is different. The problem is that if the watch is already initiated (but it still has resourceVersion in the future comparing to the cacher current state) it doesn't check if the event that it has just received is fresh enough that should be send to the watcher. For reproducing this problem, see: The test that I added there should be failing, but it's passing. I will fix this. |
@wojtek-t your example shouldn't get the added event unless it occurred past resourceVersion 1000, right? |
oh, misread... should be failing, but is passing :) |
@liggitt - exactly; |
To clarify - the fix is to pass resourceVersion together with events to the "cacheWatcher" (as part of watchCacheEvent) and in filtering function check if the RV is greater then the initial one. |
right |
Does something like this look like it's going in the right direction? https://gist.github.com/47d197657af47d43a5717a2de2b68743 |
@ncdc - I wanted to avoid parsing the number you are doing in process() method, and instead of that add a new field "ResourceVersion" to watchCacheEvent where we already have this parsed. |
@wojtek-t I've now read through more of the cache code and I think it's best to avoid parsing the resource version twice. I'll update accordingly. |
When comparing the event's resourceVersion with the value passed to |
Also, @wojtek-t, your unit test that you added above that passes now but should fail once the corrected code is in place... is there an easy way to add a "future" test that runs quickly and won't flake? I feel like checking for the absence of an event could be problematic. |
|
Yeah - that's problematic. What we can do is to wait e.g. 1s for it and pass then. It's far from being ideal, but probably better than nothing. |
I'm really hesitant to add something that waits 1s. That sounds really dangerous (flakey). What about sending data to etcd with various resource versions and making sure you only get the ones newer than the initial resource version? |
watch a few forward in the future, then add enough to reach your watch window, and ensure you didn't get the ones prior to that |
Yeah that's what I was thinking |
|
Running in an endless loop, I wasn't able to immediately reproduce, BUT I was able to reproduce as soon as I started running
stress -c 4
in a separate terminal.The text was updated successfully, but these errors were encountered: