-
Notifications
You must be signed in to change notification settings - Fork 36.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability #26215
Conversation
Concept ACK dd2ef55 The description makes sense, but I have not reviewed in detail that Tagged for backport, but happy to drop again if this seems too risky. |
Might be possible to write a unit test that ensures |
Thanks, I think it's probably good to backport. SInce it's just doing an atomic assignment one step later it seems safe and it's hard to think of ways it could cause a problem. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
@mzumsande care to take a look here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK dd2ef55
Took me a while to catch up with the entire discussion around #25365, but this makes sense to me. Aside from the issue with BlockUntilSyncedToCurrentChain
that is fixed here, the order between updating prune locks and the best index shouldn't matter, since the validation code which performs the pruning does not interact with m_best_index
- so I think this is safe.
Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>
dd2ef55
to
8891949
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Updated dd2ef55 -> 8891949 (pr/untilsync.1
-> pr/untilsync.2
, compare) adding suggested comment.
re: #26215 (review)
Took me a while to catch up with the entire discussion around #25365
I hope the PR description was clear enough explaining the change and what motivated it, so it wasn't actually neccessary to read the old discussion thread, but you could let me know if I missed anything important in the explanation.
re-ACK 8891949
The PR is clear, I don't think anything is missing, I just decided I wanted to follow the discussion chronologically. |
…entChain reliability 8891949 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky) Pull request description: Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: vasild Co-authored-by: MarcoFalke ACKs for top commit: mzumsande: re-ACK 8891949 Tree-SHA512: 52e29e3772a0c92873c54e5ffb31dd66a909b68a2031b7585713cd1d976811289c98bd9bb41679a8689062f03be4f97bb8368696e789caa4607c2fd8b1fe289b
Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> Github-Pull: bitcoin#26215 Rebased-From: 8891949
Added to #26133 for backport to 24.x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge ACK 8891949
e2e4c29 tests: Test that PSBT_OUT_TAP_TREE is included correctly (Andrew Chow) 4d42c3a psbt: Only include m_tap_tree if it has scripts (Andrew Chow) d810fde psbt: Change m_tap_tree to store just the tuples (Andrew Chow) a9419ef tests: Test that PSBT_OUT_TAP_TREE is combined correctly (Andrew Chow) 4abd2ab psbt: Fix merging of m_tap_tree (Andrew Chow) 1390c96 [BugFix]: Do not allow deserializing PSBT with empty PSBT_OUT_TAP_TREE (Jeremy Rubin) 9b438f0 refactor: revert m_next_resend to not be std::atomic (stickies-v) 43ced0b wallet: only update m_next_resend when actually resending (stickies-v) fc8f2bf refactor: carve out tx resend timer logic into ShouldResend (stickies-v) a6fb674 refactor: remove unused locks for ResubmitWalletTransactions (stickies-v) 5ad82a0 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky) 997faf6 contrib: Fix capture_output in getcoins.py (willcl-ark) 7e0bcfb p2p: ProcessHeadersMessage(): fix received_new_header (Larry Ruane) c97d924 Correct sanity-checking script_size calculation (Pieter Wuille) da6fba6 docs: Add 371 to bips.md (Andrew Chow) Pull request description: Will collect backports for rc2 as they become available. Currently: * #25858 * #26124 * #26149 * #26172 * #26205 * #26212 * #26215 ACKs for top commit: dergoegge: ACK e2e4c29 achow101: ACK e2e4c29 instagibbs: ACK e2e4c29 Tree-SHA512: b6374fe202561057dbe1430d4c40f06f721eb568f91e7275ae1ee7747edf780ce64620382d13ecc4b9571d931dc25d226af8284987cf35ff6a6182c5f64eb10c
Since commit f08c9fb from PR #21726, index
BlockUntilSyncedToCurrentChain
behavior has been less reliable, and there has also been a race condition in thecoinstatsindex_initial_sync
unit test.It seems better for
BlockUntilSyncedToCurrentChain
to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order ofm_best_block_index = block;
andUpdatePruneLock
statements inSetBestBlockIndex
to make it more reliable.Also since commit f08c9fb, there has been a race condition in the
coinstatsindex_initial_sync
test. Before that commit, the atomic index best block pointerm_best_block_index
was updated as the last step ofBaseIndex::BlockConnected
, soBlockUntilSyncedToCurrentChain
could safely be used in tests to wait for the lastBlockConnected
notification to be finished before stopping and destroying the index. But after that commit, callingBlockUntilSyncedToCurrentChain
is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit callingAllowPrune()
andGetName()
on the index object. Reproducibility instructions for this are in #25365 (comment)This commit fixes the
coinstatsindex_initial_sync
race condition, even though it will require an additional change to silence TSAN false positives, #26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors #25365.There is no known race condition outside of test code currently, because the bitcoind
Shutdown
function callsFlushBackgroundCallbacks
notBlockUntilSyncedToCurrentChain
to safely shut down.Co-authored-by: vasild
Co-authored-by: MarcoFalke