Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Implicitly sync after generate* to preempt races and intermittent test failures #20362

Closed
wants to merge 4 commits into from

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Nov 10, 2020

The most frequent failure in functional tests are intermittent races. Fixing such bugs is cumbersome because it involves:

  • Noticing the failure
  • Fetching and reading the log to determine the test case that failed
  • Adding a self.sync_all() where it was forgotten
  • Spamming out a pr and waiting for review, which is already sparse

Also, writing a linter to catch those is not possible, nor is review effective in finding these bugs prior to merge.

Fix all future intermittent races caused by a missing sync_block call by calling sync_all implicitly after each generate*, unless opted out. This ensures that the code is race-free (with regards to blocks) when the tests pass once, instead of our current approach where the code can never be guaranteed to be race-free.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 10, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #22567 (test: Implicitly sync after generate* to preempt races and intermittent test failures by MarcoFalke)
  • #22437 (test, refactor: add GetTransaction() coverage, improve rpc_rawtransaction by jonatack)
  • #22364 (wallet: Make a tr() descriptor by default by achow101)
  • #22229 (test: consolidate to f-strings (part 1) by fanquake)
  • #21283 (Implement BIP 370 PSBTv2 by achow101)
  • #10102 ([experimental] Multiprocess bitcoin by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@practicalswift
Copy link
Contributor

practicalswift commented Nov 10, 2020

Concept ACK: thanks for doing this - intermittent test failures are the worst!

Bike shed nit: fn or func might be more unambiguous (but less fun) abbreviations than fun :)

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

How about for now:

try_dispatch_with_sync
    try
        dispatch
    except
        sync_all
        dispatch
        log with print_stack

@@ -296,9 +296,27 @@ def wait_for_cookie_credentials(self):
time.sleep(1.0 / poll_per_s)
self._raise_assertion_error("Unable to retrieve cookie credentials after {}s".format(self.rpc_timeout))

def generate(self, nblocks, maxtries=1000000):
def generateblock(self, *args, sync_fun=True, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faf4ef6

An alternative to "overload" these generate* is to add a generic _rpc_dispatch_and_sync (__getattr__ -> _rpc_dispatch_and_sync -> _rpc_dispatch).

It would be cool to keep the last generate* extract_stack that when a related error occurs we could print it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make calling code tremendously verbose, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am misunderstanding. Do you have a working diff that I can steal?

test/functional/test_framework/test_framework.py Outdated Show resolved Hide resolved
@amitiuttarwar
Copy link
Contributor

Concept ACK!

very nice, thank you

@laanwj
Copy link
Member

laanwj commented Nov 20, 2020

Concept ACK, rooting out a cause of intermittent issues (due to test flakiness) is great.

@jnewbery
Copy link
Contributor

Concept ACK

@jnewbery
Copy link
Contributor

What do you think about adding a generate() method to TestFramework that takes the index of the node that you want to generate on? It seems strange to pass a TestFramework function to the constructor of the TestNode.

@maflcko
Copy link
Member Author

maflcko commented Jul 26, 2021

That would require changing all calls to generate* and also disable TestNode::generate* to avoid races creeping in by accident. Happy to do that refactor if reviewers think it is worth it. I am mostly interested in the functional changes here.

@jnewbery
Copy link
Contributor

Happy to do that refactor if reviewers think it is worth it.

I think it's worth it, to avoid adding an implicit dependency from TestNode onto TestFramework.

@maflcko
Copy link
Member Author

maflcko commented Jul 28, 2021

Done in #22567. (Different pr, because the discussion here wasn't substantial, mostly GitHubs own spam, and I wanted to keep this version in case someone decides against #22567).

MarcoFalke added 4 commits July 28, 2021 17:58
-BEGIN VERIFY SCRIPT-
perl -0777 -pi -e 's/(generate[^\n]*\)[^\n]*)(\n|\s)+self.sync_.*\(\)\n/\1\n/g' $(git grep -l generate ./test)
-END VERIFY SCRIPT-
@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@maflcko maflcko closed this Aug 19, 2021
@maflcko maflcko deleted the 2011-noSync branch August 19, 2021 13:37
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants