Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability #26215

Merged
merged 1 commit into from
Oct 10, 2022

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Sep 30, 2022

Since commit f08c9fb from PR #21726, index BlockUntilSyncedToCurrentChain behavior has been less reliable, and there has also been a race condition in the coinstatsindex_initial_sync unit test.

It seems better for BlockUntilSyncedToCurrentChain to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of m_best_block_index = block; and UpdatePruneLock statements in SetBestBlockIndex to make it more reliable.

Also since commit f08c9fb, there has been a race condition in the coinstatsindex_initial_sync test. Before that commit, the atomic index best block pointer m_best_block_index was updated as the last step of BaseIndex::BlockConnected, so BlockUntilSyncedToCurrentChain could safely be used in tests to wait for the last BlockConnected notification to be finished before stopping and destroying the index. But after that commit, calling BlockUntilSyncedToCurrentChain is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling AllowPrune() and GetName() on the index object. Reproducibility instructions for this are in #25365 (comment)

This commit fixes the coinstatsindex_initial_sync race condition, even though it will require an additional change to silence TSAN false positives, #26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors #25365.

There is no known race condition outside of test code currently, because the bitcoind Shutdown function calls FlushBackgroundCallbacks not BlockUntilSyncedToCurrentChain to safely shut down.

Co-authored-by: vasild
Co-authored-by: MarcoFalke

@ryanofsky ryanofsky marked this pull request as ready for review September 30, 2022 13:21
@maflcko maflcko added this to the 24.0 milestone Sep 30, 2022
@maflcko
Copy link
Member

maflcko commented Sep 30, 2022

Concept ACK dd2ef55

The description makes sense, but I have not reviewed in detail that bitcoind behaviour remains unchanged.

Tagged for backport, but happy to drop again if this seems too risky.

@ryanofsky
Copy link
Contributor Author

Might be possible to write a unit test that ensures BlockUntilSyncedToCurrentChain doesn't return until after prune locks are updated. But it seems a little tricky so this PR does not have a test for now.

@ryanofsky
Copy link
Contributor Author

Tagged for backport, but happy to drop again if this seems too risky.

Thanks, I think it's probably good to backport. SInce it's just doing an atomic assignment one step later it seems safe and it's hard to think of ways it could cause a problem.

@DrahtBot
Copy link
Contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko maflcko requested a review from fjahr October 4, 2022 10:49
@fanquake
Copy link
Member

fanquake commented Oct 4, 2022

@mzumsande care to take a look here?

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK dd2ef55

Took me a while to catch up with the entire discussion around #25365, but this makes sense to me. Aside from the issue with BlockUntilSyncedToCurrentChain that is fixed here, the order between updating prune locks and the best index shouldn't matter, since the validation code which performs the pruning does not interact with m_best_index- so I think this is safe.

src/index/base.cpp Show resolved Hide resolved
Since commit f08c9fb from PR
bitcoin#21726, index
`BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has
also been a race condition in the `coinstatsindex_initial_sync` unit test.

It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the
last connected block to be fully processed, than to be able to return before
prune locks are set, so this switches the order of `m_best_block_index =
block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more
reliable.

Also since commit f08c9fb, there has been a
race condition in the `coinstatsindex_initial_sync` test. Before that commit,
the atomic index best block pointer `m_best_block_index` was updated as the
last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain`
could safely be used in tests to wait for the last `BlockConnected`
notification to be finished before stopping and destroying the index. But
after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer
sufficient, and there is a race between the test shutdown code which destroys
the index object and the new code introduced in that commit calling
`AllowPrune()` and `GetName()` on the index object. Reproducibility
instructions for this are in
bitcoin#25365 (comment)

This commit fixes the `coinstatsindex_initial_sync` race condition, even though
it will require an additional change to silence TSAN false positives,
bitcoin#26188, after it is fixed. So this
partially addresses but does not resolve the bug reporting TSAN errors
bitcoin#25365.

There is no known race condition outside of test code currently, because the
bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not
`BlockUntilSyncedToCurrentChain` to safely shut down.

Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
Co-authored-by: MacroFake <falke.marco@gmail.com>
Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Updated dd2ef55 -> 8891949 (pr/untilsync.1 -> pr/untilsync.2, compare) adding suggested comment.

re: #26215 (review)

Took me a while to catch up with the entire discussion around #25365

I hope the PR description was clear enough explaining the change and what motivated it, so it wasn't actually neccessary to read the old discussion thread, but you could let me know if I missed anything important in the explanation.

src/index/base.cpp Show resolved Hide resolved
@mzumsande
Copy link
Contributor

re-ACK 8891949

I hope the PR description was clear enough explaining the change and what motivated it, so it wasn't actually neccessary to read the old discussion thread, but you could let me know if I missed anything important in the explanation.

The PR is clear, I don't think anything is missing, I just decided I wanted to follow the discussion chronologically.

@fanquake fanquake merged commit 4175c33 into bitcoin:master Oct 10, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 10, 2022
…entChain reliability

8891949 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky)

Pull request description:

  Since commit f08c9fb from PR bitcoin#21726, index  `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test.

  It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable.

  Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index.  But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment)

  This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365.

  There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down.

  Co-authored-by: vasild
  Co-authored-by: MarcoFalke

ACKs for top commit:
  mzumsande:
    re-ACK 8891949

Tree-SHA512: 52e29e3772a0c92873c54e5ffb31dd66a909b68a2031b7585713cd1d976811289c98bd9bb41679a8689062f03be4f97bb8368696e789caa4607c2fd8b1fe289b
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Oct 11, 2022
Since commit f08c9fb from PR
bitcoin#21726, index
`BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has
also been a race condition in the `coinstatsindex_initial_sync` unit test.

It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the
last connected block to be fully processed, than to be able to return before
prune locks are set, so this switches the order of `m_best_block_index =
block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more
reliable.

Also since commit f08c9fb, there has been a
race condition in the `coinstatsindex_initial_sync` test. Before that commit,
the atomic index best block pointer `m_best_block_index` was updated as the
last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain`
could safely be used in tests to wait for the last `BlockConnected`
notification to be finished before stopping and destroying the index. But
after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer
sufficient, and there is a race between the test shutdown code which destroys
the index object and the new code introduced in that commit calling
`AllowPrune()` and `GetName()` on the index object. Reproducibility
instructions for this are in
bitcoin#25365 (comment)

This commit fixes the `coinstatsindex_initial_sync` race condition, even though
it will require an additional change to silence TSAN false positives,
bitcoin#26188, after it is fixed. So this
partially addresses but does not resolve the bug reporting TSAN errors
bitcoin#25365.

There is no known race condition outside of test code currently, because the
bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not
`BlockUntilSyncedToCurrentChain` to safely shut down.

Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
Co-authored-by: MacroFake <falke.marco@gmail.com>

Github-Pull: bitcoin#26215
Rebased-From: 8891949
@fanquake
Copy link
Member

Added to #26133 for backport to 24.x.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge ACK 8891949

achow101 added a commit that referenced this pull request Oct 13, 2022
e2e4c29 tests: Test that PSBT_OUT_TAP_TREE is included correctly (Andrew Chow)
4d42c3a psbt: Only include m_tap_tree if it has scripts (Andrew Chow)
d810fde psbt: Change m_tap_tree to store just the tuples (Andrew Chow)
a9419ef tests: Test that PSBT_OUT_TAP_TREE is combined correctly (Andrew Chow)
4abd2ab psbt: Fix merging of m_tap_tree (Andrew Chow)
1390c96 [BugFix]: Do not allow deserializing PSBT with empty PSBT_OUT_TAP_TREE (Jeremy Rubin)
9b438f0 refactor: revert m_next_resend to not be std::atomic (stickies-v)
43ced0b wallet: only update m_next_resend when actually resending (stickies-v)
fc8f2bf refactor: carve out tx resend timer logic into ShouldResend (stickies-v)
a6fb674 refactor: remove unused locks for ResubmitWalletTransactions (stickies-v)
5ad82a0 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky)
997faf6 contrib: Fix capture_output in getcoins.py (willcl-ark)
7e0bcfb p2p: ProcessHeadersMessage(): fix received_new_header (Larry Ruane)
c97d924 Correct sanity-checking script_size calculation (Pieter Wuille)
da6fba6 docs: Add 371 to bips.md (Andrew Chow)

Pull request description:

  Will collect backports for rc2 as they become available. Currently:
  * #25858
  * #26124
  * #26149
  * #26172
  * #26205
  * #26212
  * #26215

ACKs for top commit:
  dergoegge:
    ACK e2e4c29
  achow101:
    ACK e2e4c29
  instagibbs:
    ACK e2e4c29

Tree-SHA512: b6374fe202561057dbe1430d4c40f06f721eb568f91e7275ae1ee7747edf780ce64620382d13ecc4b9571d931dc25d226af8284987cf35ff6a6182c5f64eb10c
@bitcoin bitcoin locked and limited conversation to collaborators Oct 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants