Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index: Fix backwards search for bestblock #23365

Merged
merged 2 commits into from
Dec 13, 2021

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Oct 26, 2021

This PR attempts to fix an intermittent Init issue encountered during the stress testing of #23289, which relates to the pruning-compatible filter reconstruction logic introduced in #15946.

The problem would occur when the node starts with -txindex=1 but ThreadSync is interrupted after it sets m_best_block_index to Genesis, and before it gets do any further work.
In that case, during the next restart of the node, an Init error would be thrown because BaseIndex::Init() tries to backtrack from the tip to the last block which has been successfully indexed (here: Genesis), but the backtracking logic didn't work properly in this case:
The loop
while (block_to_test && block->pprev && (block->pprev->nStatus & BLOCK_HAVE_DATA)) checks if a predecessor exists before performing the check block_to_test == block and then possbily setting prune_violation = false
If block_to_test and block are the Genesis block this check will not be reached because block->pprev does not exist.

To reproduce this bug on regtest:

  1. start a node with a fresh datadir using -txindex=1 (or any other index)
  2. stop and restart without any index
  3. mine a block
  4. stop and restart again with the index enabled
    ->InitError Error: txindex best block of the index goes beyond pruned data. (...)

Fix this by requiring that we have the data for the block of the current iteration block (instead of requiring it for the predecessor block->pprev)
That way, the check for block_to_test == block is also reached when block_to_test is the Genesis block.
No longer requiring the data of block->pprev also means that we can now prune up to m_best_block_index height without requiring a reindex (one block more than before). I added this edge case to feature_blockfilterindex_prune.py, the new version should fail on master.

@jamesob
Copy link
Contributor

jamesob commented Oct 27, 2021

Concept ACK

@mzumsande mzumsande marked this pull request as draft October 27, 2021 14:59
@mzumsande
Copy link
Contributor Author

mzumsande commented Oct 27, 2021

The proposed change would also mean that it would now be possible to prune up to the best block of an index without needing to reindex (+1 block compared to master). I hope I'm not missing something, but I don't see why we still need the full data of block->pprev in the case where block is the best block of the index and block->pprev was indexed in the past.
I added a test for this edge case to feature_blockfilterindex_prune.py.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 28, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #21726 (Improve Indices on pruned nodes via prune blockers by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@mzumsande
Copy link
Contributor Author

Moved out of draft and added steps to reproduce to OP.
Not sure if the second commit (test) should be merged, maybe it's too detailed, plus the test gets reworked in #21726 anyway.

Note that the last commit of #21726 would partly "fix" the issue too by just not running the affected code unless -prune=1 is specified but doesn't address the root cause.

@laanwj
Copy link
Member

laanwj commented Nov 10, 2021

Concept ACK

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a36e23b (jamesob/ackr/23365.1.mzumsande.index_fix_backwards_sear)

Thanks for looking into this and figuring out a good fix. Once this is merged, I can update #23289 to cover interrupting init after index initialization.


I built this locally and confirmed that the attached test fails if the index/base.cpp changes are reverted. Logically the change makes sense - if the index's best block is the genesis, we should be able to continue indexation from that point (which is all the rephrased while condition does).

The block = block->pprev; does make me a little nervous (although as-written the code would never cause any problems given how we initialize block_to_test), so if you wanted to you could include an assertion like

diff --git a/src/index/base.cpp b/src/index/base.cpp
index 88c74ec543..8525dcbfa0 100644
--- a/src/index/base.cpp
+++ b/src/index/base.cpp
@@ -96,6 +96,7 @@ bool BaseIndex::Init()
                     prune_violation = false;
                     break;
                 }
+                assert(block->pprev);
                 block = block->pprev;
             }
         }
Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK a36e23b183a6b8d1c60598618f2a5c8f31da02e0 ([`jamesob/ackr/23365.1.mzumsande.index_fix_backwards_sear`](https://github.com/jamesob/bitcoin/tree/ackr/23365.1.mzumsande.index_fix_backwards_sear))

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmGbxWwACgkQepNdrbLE
TwVIdg/8CKYBVyw9DvvSx/G2E31/z7avW4uwj5cq10HMmjtsT/cCHhFtofDtVvIG
9binTF/dlYhyzQmv7FduzQkgpZtehokrrBugoktyh6zeJpRV+mPUxZvFJ0VUCK1w
+dVJ2zdbDmDuogJR7+ZZbGgBGxPKjS9t/Q+/LQIP221lgAVi1MPJ8oYuyg4+R6hn
ueS3piEtXmHnQt0F05PUhJDZZidB/3hRp1bu9Z0H79TrL65wnA1bnDf3+IKbGeQL
CPsYxZR4VZRiIqkSESsKuIbIVE2ELNJcQHP5pKoA389dBDbDlMfw1Lo/N+opB1tm
kbkrCt0ZB4HEAwsbyi3KWOue5rYkx5zuz6/7QW6sChkfGLwE+F+UfyvzCzy2P8FI
Fzd8lx/74O/+0NtcIpY+kSlPxztdQCuqCZHmx8cuYvnfPDIWkg2TrgX77JPoGJHO
x8mJBGWrTkxYoEM2brLWBKK24pPbafrx860NWEzJEoJ5xreZOfAzsBZPLtEZtymq
hggWvPjgGljGUFnKakjhP3ahMgDBNyQxoXM/n/XIE3NSepvRE7WOc5yxj2gGScM8
h6oIYZVmlgbbImA1asNDgXRk4acrA/N+3dZdUCAyjPz0lhx3/g2XznCZsfRyZdjA
cftewu0RF0l/XnyRZ37K5xvTUsjDYqb2omufwnmgy2giD9QnHdA=
=5ydW
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-5.10.0-9-amd64-x86_64-with-glibc2.28

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --enable-experimental --disable-openssl-tests --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --enable-experimental --disable-openssl-tests --no-create --no-recursion

Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4 -msha -msse4.1 -msse4.2  i

Compiler version: Debian clang version 13.0.1-++20211019122913+8a93745a7121-1~exp1~20211019003538.10

This allows filters to be reconstructed when the best known block is
the Genesis block without needing to reindex.
It fixes Init errors seen in bitcoin#23289.
@mzumsande
Copy link
Contributor Author

The block = block->pprev; does make me a little nervous (although as-written the code would never cause any problems given how we initialize block_to_test), so if you wanted to you could include an assertion like

Thanks - I added the assertion as suggested, and rebased. Note that adding the assert only changes the kind of error: Before, if block->pprev had not existed (which I agree shouldn't be possible), the InitError would have been thrown (because the while condition includes a check for block).

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

if (block_to_test == block) {
prune_violation = false;
break;
}
assert(block->pprev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a comment here on why block->pprev must exist (because we're walking backwards from the active chain tip to a block index in the active chain. If a block index that we encounter doesn't have a backwards pointer, then the chain is malformed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually a correct explanation (the genesis block is in the active chain and doesn't have pprev set). The reason this assertion will hold is because of how block and block_to_test are chosen; the conditional above this line is guaranteed to break at-or-higher-than genesis because of the few lines of logic above (which, if summarized in a comment, would just end up repeating the fairly-straightforward code above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're saying the same thing as I am. Because of the way block and block_to_test are chosen, if we walk backwards from block, we're guaranteed to encounter block_to_test (either at or before the genesis block).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get to this earlier, would have added a comment, do people think that this warrants a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzumsande sounds like it could be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #23777

Comment on lines +52 to +53
pruneheight_2 = self.nodes[0].pruneblockchain(1000)
assert_equal(pruneheight_2, 998)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why calling pruneblockchain(1000) prunes up to exactly block 998? Is it because that's where the block file happens to wrap (i.e. blocks 999 and 1000 are in the same file as later blocks and so can't be pruned). If so, that seems a bit brittle (e.g. if the default block file size of disk serialization changes, then this test will fail), and definitely seems worthy of a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because that's where the block file happens to wrap

Yes, I believe that is the reason, and I agree it's brittle (although the test relied before on this, e.g. at assert_equal(pruneheight, 248) - I'm not sure though if there is a better way to test pruning in a detailed way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added in #23777

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial code review ACK 9600ea0 for the code change, not the test changes. (Test changes are indirect and little over my head.) It seems obvious that previous code prune_violation = true, while (block->pprev) would incorrectly detect a prune violation at the genesis block, and the fix here make sense and looks correct.

@maflcko maflcko merged commit bf66e25 into bitcoin:master Dec 13, 2021
jamesob added a commit to jamesob/bitcoin that referenced this pull request Dec 13, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 13, 2021
9600ea0 test: Add edge case of pruning up to index height (Martin Zumsande)
698c524 index: Fix backwards search for bestblock (Martin Zumsande)

Pull request description:

  This PR attempts to fix an intermittent Init issue encountered during the stress testing of bitcoin#23289, which relates to the pruning-compatible filter reconstruction logic introduced in bitcoin#15946.

  The problem would occur when the node starts with `-txindex=1` but `ThreadSync` is interrupted after it sets `m_best_block_index` to Genesis, and before it gets do any further work.
  In that case, during the next restart of the node, an Init error would be thrown because  `BaseIndex::Init()` tries to backtrack from the tip to the last block which has been successfully indexed (here: Genesis), but the backtracking logic didn't work properly in this case:
  The loop
  `while (block_to_test && block->pprev && (block->pprev->nStatus & BLOCK_HAVE_DATA))` checks if a predecessor exists **before** performing the check `block_to_test == block` and then possbily setting `prune_violation = false`
  If `block_to_test` and `block` are the Genesis block this check will not be reached because `block->pprev` does not exist.

  To reproduce this bug on regtest:
  1) start a node with a fresh datadir using `-txindex=1` (or any other index)
  2) stop and restart without any index
  3) mine a block
  3) stop and restart again with the index enabled
  ->InitError `Error: txindex best block of the index goes beyond pruned data. (...)`

  Fix this by requiring that we have the data for the block of the current iteration `block` (instead of requiring it for the predecessor `block->pprev`)
  That way, the check for `block_to_test == block` is also reached when `block_to_test` is the Genesis block.
  No longer requiring the data of `block->pprev` also means that we can now prune up to `m_best_block_index` height without requiring a reindex (one block more than before). I added this edge case to `feature_blockfilterindex_prune.py`, the new version should fail on master.

ACKs for top commit:
  ryanofsky:
    Partial code review ACK 9600ea0 for the code change, not the test changes. (Test changes are indirect and little over my head.) It seems obvious that previous code `prune_violation = true, while (block->pprev)` would incorrectly detect a prune violation at the genesis block, and the fix here make sense and looks correct.

Tree-SHA512: c717f372cee8fd49718b1b18bfe237aa6ba3ff4468588c10e1272d7a2ef3981d10af4e57de51dec295e2ca72d441bc6c2812f7990011a94d7f818775e3ff1a38
mzumsande added a commit to mzumsande/bitcoin that referenced this pull request Dec 14, 2021
mzumsande added a commit to mzumsande/bitcoin that referenced this pull request Dec 14, 2021
These were suggested in review of bitcoin#23365

Co-authored-by: John Newbery <john@johnnewbery.com>
@bitcoin bitcoin deleted a comment from 2310Mas Dec 14, 2021
mzumsande added a commit to mzumsande/bitcoin that referenced this pull request Dec 15, 2021
These were suggested in review of bitcoin#23365

Co-authored-by: John Newbery <john@johnnewbery.com>
maflcko pushed a commit to bitcoin-core/gui that referenced this pull request Dec 16, 2021
…ards search)

e4a8d56 doc: add explanations for assert in index and magic numbers in test (Martin Zumsande)

Pull request description:

  This adds two explanations suggested in the review of #23365,  that I didn't manage to address before that PR was merged:

  bitcoin/bitcoin#23365 (comment)
  bitcoin/bitcoin#23365 (comment)

ACKs for top commit:
  jnewbery:
    ACK e4a8d56

Tree-SHA512: 0500c8abb37bb3e3694463ad5e74b2e1483615ccf1d7529b0d5faa694652ada17d242dc7fda6d995733766c627d54178a2c8fa21a570cdf13292f64ff5425b56
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 16, 2021
…rds search)

e4a8d56 doc: add explanations for assert in index and magic numbers in test (Martin Zumsande)

Pull request description:

  This adds two explanations suggested in the review of bitcoin#23365,  that I didn't manage to address before that PR was merged:

  bitcoin#23365 (comment)
  bitcoin#23365 (comment)

ACKs for top commit:
  jnewbery:
    ACK e4a8d56

Tree-SHA512: 0500c8abb37bb3e3694463ad5e74b2e1483615ccf1d7529b0d5faa694652ada17d242dc7fda6d995733766c627d54178a2c8fa21a570cdf13292f64ff5425b56
0xB10C pushed a commit to 0xB10C/bitcoin that referenced this pull request Dec 16, 2021
These were suggested in review of bitcoin#23365

Co-authored-by: John Newbery <john@johnnewbery.com>
jamesob added a commit to jamesob/bitcoin that referenced this pull request Dec 29, 2021
mzumsande pushed a commit to mzumsande/bitcoin that referenced this pull request Jan 17, 2022
rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Feb 3, 2022
These were suggested in review of bitcoin#23365

Co-authored-by: John Newbery <john@johnnewbery.com>
rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Feb 3, 2022
kwvg pushed a commit to kwvg/dash that referenced this pull request Feb 26, 2022
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 5, 2022
Summary:
PR description:
> Maintaining the blockfilterindexes in prune mode is possible and may lead to efficient p2p based rescans of wallets (restore backups, import/sweep keys) beyond the prune height (rescans not part of that PR).
>
> This PR allows running the blockfilterindex(es) in conjunction with pruning.
>
> - Bitcoind/Qt will shutdown during startup when missing block data has been detected ([re]enable -blockfilterindex when we already have pruned)
> - manual block pruning is disabled during blockfilterindex sync
> - auto-pruning is delayed during blockfilterindex sync

----

> Allow blockfilter in conjunction with prune
bitcoin/bitcoin@6abe9f5

----

> index: Fix backwards search for bestblock
bitcoin/bitcoin@698c524

----

> Avoid accessing nullpointer in BaseIndex::GetSummary()
bitcoin/bitcoin@00d57ff

----

> Avoid pruning below the blockfilterindex sync height
bitcoin/bitcoin@5e11226

----

> Add debug startup parameter -fastprune for more effective pruning tests
bitcoin/bitcoin@00d57ff

----

> Add functional test for blockfilterindex in prune-mode
bitcoin/bitcoin@ab3a0a2

----

> Fix several bugs in feature_blockfilterindex_prune.py
[[bitcoin/bitcoin#21230 | core#21230]]

----

> test: Intermittent issue in feature_blockfilterindex_prune
[[bitcoin/bitcoin#21252 | core#21252]]

----

> test: improve assertions in feature_blockfilterindex_prune.py
> test: remove unneeded node from feature_blockfilterindex_prune.py
[[bitcoin/bitcoin#21297 | core#21297]]

----

> test: Add edge case of pruning up to index height
bitcoin/bitcoin@9600ea0

This is a backport of [[bitcoin/bitcoin#15946 | core#15946]], [[bitcoin/bitcoin#21230 | core#21230]], [[bitcoin/bitcoin#21252 | core#21252]], [[bitcoin/bitcoin#21297 | core#21297]] and  [[bitcoin/bitcoin#23365 |  core#23365]]

Backport notes:
 - the test was buggy and ugly code  after the first commit, so I had to squash all these commits to reach an acceptable quality.
 - I had to use different numbers of blocks that are generated and pruned, because Bitcoin ABC can fit more blocks into each blk?????.dat file than core because in this test the witness data of the coinbase transaction makes core blocks larger. Comments were added to explain this where needed.
 - I used the shortcut `node = self.nodes[0]` to make the code a bit more readable (shorter lines)

Test Plan:
`ninja all check-all`

Run IBD with pruning enabled
` src/bitcoind -prune=2000 -blockfilterindex=1`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D11299
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 25, 2022
Summary:
> Now that [[bitcoin/bitcoin#23365 | core#23365]] is merged.

[[bitcoin/bitcoin#23365 | core#23365]] was backported in D11299

This concludes backport of [[bitcoin/bitcoin#23737 | core#23737]]
bitcoin/bitcoin@8904f17

Depends on D12614

Test Plan: `ninja check-functional`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D12615
@bitcoin bitcoin locked and limited conversation to collaborators Dec 14, 2022
@mzumsande mzumsande deleted the 202110_index_backtrack branch October 13, 2023 15:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants