-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index: Verify the block filter hash when reading the filter from disk. #24832
index: Verify the block filter hash when reading the filter from disk. #24832
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
A few suggestions regarding filter_checked
diff --git a/src/blockfilter.cpp b/src/blockfilter.cpp
index 59f34bc54e..0ab89fdbdb 100644
--- a/src/blockfilter.cpp
+++ b/src/blockfilter.cpp
@@ -47,7 +47,7 @@ GCSFilter::GCSFilter(const Params& params)
: m_params(params), m_N(0), m_F(0), m_encoded{0}
{}
-GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, const bool filter_checked)
+GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked)
: m_params(params), m_encoded(std::move(encoded_filter))
{
SpanReader stream{GCS_SER_TYPE, GCS_SER_VERSION, m_encoded};
@@ -221,7 +221,7 @@ static GCSFilter::ElementSet BasicFilterElements(const CBlock& block,
}
BlockFilter::BlockFilter(BlockFilterType filter_type, const uint256& block_hash,
- std::vector<unsigned char> filter, const bool filter_checked)
+ std::vector<unsigned char> filter, bool filter_checked)
: m_filter_type(filter_type), m_block_hash(block_hash)
{
GCSFilter::Params params;
diff --git a/src/blockfilter.h b/src/blockfilter.h
index af054dbf34..5de5679834 100644
--- a/src/blockfilter.h
+++ b/src/blockfilter.h
@@ -59,7 +59,7 @@ public:
explicit GCSFilter(const Params& params = Params());
/** Reconstructs an already-created filter from an encoding. */
- GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, const bool filter_checked=false);
+ GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked = false);
/** Builds a new filter from the params and set of elements. */
GCSFilter(const Params& params, const ElementSet& elements);
@@ -122,7 +122,7 @@ public:
//! Reconstruct a BlockFilter from parts.
BlockFilter(BlockFilterType filter_type, const uint256& block_hash,
- std::vector<unsigned char> filter, const bool filter_checked=false);
+ std::vector<unsigned char> filter, bool filter_checked = false);
//! Construct a new BlockFilter of the specified type from a block.
BlockFilter(BlockFilterType filter_type, const CBlock& block, const CBlockUndo& block_undo);
diff --git a/src/index/blockfilterindex.cpp b/src/index/blockfilterindex.cpp
index bad766ebe1..0a4728f571 100644
--- a/src/index/blockfilterindex.cpp
+++ b/src/index/blockfilterindex.cpp
@@ -159,7 +159,7 @@ bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& f
uint256 result;
CHash256().Write(encoded_filter).Finalize(result);
if (result != hash) return error("Checksum mismatch in filter decode.");
- filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), true);
+ filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), /*filter_checked=*/true);
}
src/blockfilter.cpp
Outdated
// Verify that the encoded filter contains exactly N elements. If it has too much or too little | ||
// data, a std::ios_base::failure exception will be raised. | ||
BitStreamReader<SpanReader> bitreader{stream}; | ||
BitStreamReader<SpanReader> bitreader(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
294a845
to
2e878de
Compare
I've applied all the suggested changes :) |
2e878de
to
1faac19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK, a few iterative further comments on second look.
src/index/blockfilterindex.h
Outdated
@@ -31,7 +31,7 @@ class BlockFilterIndex final : public BaseIndex | |||
FlatFilePos m_next_filter_pos; | |||
std::unique_ptr<FlatFileSeq> m_filter_fileseq; | |||
|
|||
bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const; | |||
bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter, const uint256& hash) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to order the ReadFilterFromDisk() arguments with in-params first, then the out-param. There are only four lines to change of the ones touched in this pull.
bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter, const uint256& hash) const; | |
bool ReadFilterFromDisk(const FlatFilePos& pos, const uint256& hash, BlockFilter& filter) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
src/bench/gcs_filter.cpp
Outdated
}); | ||
} | ||
|
||
static void BlockFilterGetHash(benchmark::Bench& bench) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be handy to name all of these benchmarks with a common prefix to group them together in the output and that is easy to filter for, i.e. ./src/bench/bench_bitcoin -filter=EvictionProtection*.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the GCSFIlter benchmarks are now changed to have the prefix GCSFilter
.
src/bench/gcs_filter.cpp
Outdated
GCSFilter::Element element(32); | ||
element[0] = static_cast<unsigned char>(i); | ||
element[1] = static_cast<unsigned char>(i >> 8); | ||
elements.insert(std::move(element)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: it looks like there is a fair amount of duplicate boilerplate in each of these benchmarks that you could extract out to a common setup helper in a separate commit, if you like. See EvictionProtectionCommon()
in src/bench/peer_eviction.cpp
for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most of the duplicate code is just generating the elements, how's separating that out to a function like so:
static const GCSFilter::ElementSet GenerateGCSTestElements()
{
GCSFilter::ElementSet elements;
for (int i = 0; i < 10000; ++i) {
GCSFilter::Element element(32);
element[0] = static_cast<unsigned char>(i);
element[1] = static_cast<unsigned char>(i >> 8);
elements.insert(std::move(element));
}
return elements;
}
And having each of the benchmarks call that function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied the above change I suggested above.
src/blockfilter.h
Outdated
@@ -59,7 +59,7 @@ class GCSFilter | |||
explicit GCSFilter(const Params& params = Params()); | |||
|
|||
/** Reconstructs an already-created filter from an encoding. */ | |||
GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter); | |||
GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked=false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang format for each of these defaults
GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked=false); | |
GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needed as the default value was removed
src/bench/gcs_filter.cpp
Outdated
auto encoded = filter.GetEncoded(); | ||
|
||
bench.unit("elem").run([&] { | ||
GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, true); | |
GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*filter_checked=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
As far as I can see, the sanity check that this PR makes optional is now only invoked from BlockFilter::Unserialize()
, which is used only in test code because as a full node, bitcoin core doesn't have the use case of accepting unchecked serialized blockfilters over p2p, only send self-generated ones to peers.
So I wonder whether it might make sense to just remove the old sanity check instead of making it optional?
@mzumsande good point, and I think it also highlights that it may alternatively be better to drop the default false value for drop default false filter_checked value
diff --git a/src/bench/gcs_filter.cpp b/src/bench/gcs_filter.cpp
index 89fa07f602..56f3cff0c1 100644
--- a/src/bench/gcs_filter.cpp
+++ b/src/bench/gcs_filter.cpp
@@ -52,7 +52,7 @@ static void DecodeGCSFilter(benchmark::Bench& bench)
auto encoded = filter.GetEncoded();
bench.unit("elem").run([&] {
- GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded);
+ GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*filter_checked=*/false);
});
}
@@ -66,7 +66,7 @@ static void BlockFilterGetHash(benchmark::Bench& bench)
elements.insert(std::move(element));
}
GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, elements);
- BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded());
+ BlockFilter block_filter(BlockFilterType::BASIC, {}, filter.GetEncoded(), /*filter_checked=*/false);
bench.unit("elem").run([&] {
block_filter.GetHash();
@@ -86,7 +86,7 @@ static void DecodeCheckedGCSFilter(benchmark::Bench& bench)
auto encoded = filter.GetEncoded();
bench.unit("elem").run([&] {
- GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, true);
+ GCSFilter filter({0, 0, BASIC_FILTER_P, BASIC_FILTER_M}, encoded, /*filter_checked=*/true);
});
}
BENCHMARK(BlockFilterGetHash);
diff --git a/src/blockfilter.h b/src/blockfilter.h
index 21d6a295a2..f823354989 100644
--- a/src/blockfilter.h
+++ b/src/blockfilter.h
@@ -59,7 +59,7 @@ public:
explicit GCSFilter(const Params& params = Params());
/** Reconstructs an already-created filter from an encoding. */
- GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked=false);
+ GCSFilter(const Params& params, std::vector<unsigned char> encoded_filter, bool filter_checked);
/** Builds a new filter from the params and set of elements. */
GCSFilter(const Params& params, const ElementSet& elements);
@@ -122,7 +122,7 @@ public:
//! Reconstruct a BlockFilter from parts.
BlockFilter(BlockFilterType filter_type, const uint256& block_hash,
- std::vector<unsigned char> filter, bool filter_checked=false);
+ std::vector<unsigned char> filter, bool filter_checked);
//! Construct a new BlockFilter of the specified type from a block.
BlockFilter(BlockFilterType filter_type, const CBlock& block, const CBlockUndo& block_undo);
@@ -164,7 +164,7 @@ public:
if (!BuildParams(params)) {
throw std::ios_base::failure("unknown filter_type");
}
- m_filter = GCSFilter(params, std::move(encoded_filter));
+ m_filter = GCSFilter(params, std::move(encoded_filter), /*filter_checked=*/false);
}
}; |
Sure I can just get rid of the option. I'll move Quick side-question: my initial thought was that it'd be helpful to leave an option in there in case anyone is using the code as a library. In general, does Bitcoin Core development think about potential use cases of the code being used as a library somewhere else? |
@kcalvinalvin maybe decided case-by-case; I don't recall a recent specific example of leave-it-in-for-reuse-as-a-library being invoked as a rationale for not moving code used only for tests out to the test code. Verified that removing code that depends on BlockFilter::Unserialize()
diff --git a/src/blockfilter.h b/src/blockfilter.h
index 21d6a295a2..764395c370 100644
--- a/src/blockfilter.h
+++ b/src/blockfilter.h
@@ -148,24 +148,6 @@ public:
<< m_block_hash
<< m_filter.GetEncoded();
}
-
- template <typename Stream>
- void Unserialize(Stream& s) {
- std::vector<unsigned char> encoded_filter;
- uint8_t filter_type;
-
- s >> filter_type
- >> m_block_hash
- >> encoded_filter;
-
- m_filter_type = static_cast<BlockFilterType>(filter_type);
-
- GCSFilter::Params params;
- if (!BuildParams(params)) {
- throw std::ios_base::failure("unknown filter_type");
- }
- m_filter = GCSFilter(params, std::move(encoded_filter));
- }
};
#endif // BITCOIN_BLOCKFILTER_H
diff --git a/src/test/blockfilter_tests.cpp b/src/test/blockfilter_tests.cpp
index 8eb4dbc592..b99d9a32c3 100644
--- a/src/test/blockfilter_tests.cpp
+++ b/src/test/blockfilter_tests.cpp
@@ -106,23 +106,6 @@ BOOST_AUTO_TEST_CASE(blockfilter_basic_test)
for (const CScript& script : excluded_scripts) {
BOOST_CHECK(!filter.Match(GCSFilter::Element(script.begin(), script.end())));
}
-
- // Test serialization/unserialization.
- BlockFilter block_filter2;
-
- CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
- stream << block_filter;
- stream >> block_filter2;
-
- BOOST_CHECK_EQUAL(block_filter.GetFilterType(), block_filter2.GetFilterType());
- BOOST_CHECK_EQUAL(block_filter.GetBlockHash(), block_filter2.GetBlockHash());
- BOOST_CHECK(block_filter.GetEncodedFilter() == block_filter2.GetEncodedFilter());
-
- BlockFilter default_ctor_block_filter_1;
- BlockFilter default_ctor_block_filter_2;
- BOOST_CHECK_EQUAL(default_ctor_block_filter_1.GetFilterType(), default_ctor_block_filter_2.GetFilterType());
- BOOST_CHECK_EQUAL(default_ctor_block_filter_1.GetBlockHash(), default_ctor_block_filter_2.GetBlockHash());
- BOOST_CHECK(default_ctor_block_filter_1.GetEncodedFilter() == default_ctor_block_filter_2.GetEncodedFilter());
}
BOOST_AUTO_TEST_CASE(blockfilters_json_test)
diff --git a/src/test/fuzz/deserialize.cpp b/src/test/fuzz/deserialize.cpp
index ed6f172a2a..3793921e86 100644
--- a/src/test/fuzz/deserialize.cpp
+++ b/src/test/fuzz/deserialize.cpp
@@ -95,12 +95,6 @@ void DeserializeFromFuzzingInput(FuzzBufferType buffer, T& obj, const std::optio
throw invalid_fuzzing_input_exception();
}
}
- try {
- ds >> obj;
- } catch (const std::ios_base::failure&) {
- throw invalid_fuzzing_input_exception();
- }
- assert(buffer.empty() || !Serialize(obj).empty());
}
template <typename T>
diff --git a/src/test/fuzz/util.h b/src/test/fuzz/util.h
index 6c91844633..e8e8e57394 100644
--- a/src/test/fuzz/util.h
+++ b/src/test/fuzz/util.h
@@ -145,11 +145,6 @@ template <typename T>
const std::vector<uint8_t> buffer = ConsumeRandomLengthByteVector(fuzzed_data_provider, max_length);
CDataStream ds{buffer, SER_NETWORK, INIT_PROTO_VERSION};
T obj;
- try {
- ds >> obj;
- } catch (const std::ios_base::failure&) {
- return std::nullopt;
- }
return obj;
} |
fe72ea6
to
3e6d98e
Compare
Addressed all of the changes requested except for removing |
ACK 3e6d98e modulo:
Here is a patch on top of your last push, if helpful, that removes |
3e6d98e
to
0b976a6
Compare
ACK 0b976a6 Sanity check: Running
and with the operative codebase changes reverted (diff)
diff --git a/src/blockfilter.cpp b/src/blockfilter.cpp
index 08e2e6f72e..9d76d2b60e 100644
--- a/src/blockfilter.cpp
+++ b/src/blockfilter.cpp
@@ -59,7 +59,7 @@ GCSFilter::GCSFilter(const Params& params, std::vector<unsigned char> encoded_fi
}
m_F = static_cast<uint64_t>(m_N) * static_cast<uint64_t>(m_params.m_M);
- if (filter_checked) return;
+ // if (filter_checked) return;
// Verify that the encoded filter contains exactly N elements. If it has too much or too little
// data, a std::ios_base::failure exception will be raised.
diff --git a/src/index/blockfilterindex.cpp b/src/index/blockfilterindex.cpp
index 1471c29a85..f83b883c8f 100644
--- a/src/index/blockfilterindex.cpp
+++ b/src/index/blockfilterindex.cpp
@@ -156,9 +156,6 @@ bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256&
std::vector<uint8_t> encoded_filter;
try {
filein >> block_hash >> encoded_filter;
- uint256 result;
- CHash256().Write(encoded_filter).Finalize(result);
- if (result != hash) return error("Checksum mismatch in filter decode.");
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), /*filter_checked=*/true); ...GCSFilterDecodeChecked is far slower and equivalent to GCSFilterDecode as expected
|
58382e2
to
d492499
Compare
Thank you for the speedy review! I pushed 04008e2 on top of 0b976a6 which does mostly what you did in https://github.com/jonatack/bitcoin/commits/remove-BlockFilter-Unserialize but it keeps the serialize/unserialize test in
I think this is a good tradeoff between keeping the serialization in the unit tests but still keeping the maintenance overhead low. |
d492499
to
04008e2
Compare
@MarcoFalke Ah my first instinct was to get rid of the entire test as |
88cdc2c
to
447174d
Compare
@kcalvinalvin if you move BlockFilter::Unserialize() to the test code in this pull, it will be a smaller change as the first commit rather than last one, WDYT? Edit: maybe a first commit that adds the check in ReadFilterFromDisk() and rename the commit, then the one that moves Unserialize() to the test code. |
I'm ok with that if it'd lower the cost of reviewing. However, I'm thinking if the cost of reviewing would be even lower if I have commit 447174d be in a separate PR. I'm sorta thinking maybe getting rid of |
@kcalvinalvin SGTM. |
447174d
to
0b976a6
Compare
Reverted remove the last commit. Latest commit is now back to 0b976a6. Will make a follow up PR after this one. |
This benchmark allows us to compare the differences between doing the sanity check for corruption via GolombRiceDecode() vs checking the hash of the encoded block filter.
Element count used in the GCSFilter benchmarks are increased to 100,000 from 10,000. Testing the benchmarks with different element counts showed that a filter with 100,000 elements resulted in the same ns/op. This this a desirable thing to have as it allows us to reason about how long a single filter element takes to process, letting us easily calculate how long a filter with N elements (where N > 100,000) would take to process. GCSFilterConstruct benchmark is now called without batch. This makes intra-bench results more intuitive as all benchmarks are in ns/op instead of a custom unit. There are no downsides to this change as testing showed that there is no observable difference in error rates in the benchmarks when calling without batch.
b32e965
to
e734228
Compare
4 things changed in the latest push:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK e734228
I'm not sure if it really matters a lot how big we choose the number of elements used in the test, I just think that if we pick custom measure other than "ops", the benchmark should be linear in that - so I like that everything uses ns/op
now.
In any case, the important feature of this PR is the improvement of the filter verification.
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK e734228
FWIW, this are the benchmark results on my machine:
$ ./src/bench/bench_bitcoin -filter=GCS.*
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
719,504.00 | 1,389.85 | 2.4% | 0.01 | GCSBlockFilterGetHash |
28,431,064.00 | 35.17 | 4.0% | 0.31 | GCSFilterConstruct |
3,641,177.00 | 274.64 | 3.4% | 0.04 | GCSFilterDecode |
13,994.04 | 71,458.99 | 1.5% | 0.01 | GCSFilterDecodeSkipCheck |
424,500.50 | 2,355.71 | 2.2% | 0.01 | GCSFilterMatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK e734228
Non-controversial and significant performance improvement when loading block filters from disk, which can be a frequent process depending on peers or RPC users. I've left some style/readability suggestions but none of them are blocking for me.
If BlockFilter::Unserialize()
is removed in a follow up, further code cleanup is possible by removing the skip_decode_check
parameter and the check branch.
$ ./src/bench/bench_bitcoin -filter=GCS.*
yields
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 766,333.00 | 1,304.92 | 0.8% | 0.01 | `GCSBlockFilterGetHash`
| 60,891,167.00 | 16.42 | 0.3% | 0.67 | `GCSFilterConstruct`
| 15,802,125.00 | 63.28 | 0.3% | 0.17 | `GCSFilterDecode`
| 1,697,875.00 | 588.97 | 0.2% | 0.02 | `GCSFilterDecodeSkipCheck`
| 1,687,000.00 | 592.77 | 0.1% | 0.02 | `GCSFilterMatch`
// Check that the hash of the encoded_filter matches the one stored in the db. | ||
uint256 block_hash; | ||
std::vector<uint8_t> encoded_filter; | ||
try { | ||
filein >> block_hash >> encoded_filter; | ||
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter)); | ||
uint256 result; | ||
CHash256().Write(encoded_filter).Finalize(result); | ||
if (result != hash) return error("Checksum mismatch in filter decode."); | ||
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter), /*skip_decode_check=*/true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider moving the hash checking into a separate function for readability and maintainability?
@@ -143,18 +144,22 @@ bool BlockFilterIndex::CommitInternal(CDBBatch& batch) | |||
return BaseIndex::CommitInternal(batch); | |||
} | |||
|
|||
bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const | |||
bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256& hash, BlockFilter& filter) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since this function also has a block_hash
variable, maybe filter_hash
would be a clearer disambiguation?
bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256& hash, BlockFilter& filter) const | |
bool BlockFilterIndex::ReadFilterFromDisk(const FlatFilePos& pos, const uint256& filter_hash, BlockFilter& filter) const |
uint256 block_hash; | ||
std::vector<uint8_t> encoded_filter; | ||
try { | ||
filein >> block_hash >> encoded_filter; | ||
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter)); | ||
uint256 result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculated_hash
might be more intuitive?
uint256 result; | |
uint256 calculated_hash; |
uint256 block_hash; | ||
std::vector<uint8_t> encoded_filter; | ||
try { | ||
filein >> block_hash >> encoded_filter; | ||
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter)); | ||
uint256 result; | ||
CHash256().Write(encoded_filter).Finalize(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think {} list initialization is generally preferred? Here, and in some other places in the PR.
CHash256().Write(encoded_filter).Finalize(result); | |
CHash256{}.Write(encoded_filter).Finalize(result); |
filter = BlockFilter(GetFilterType(), block_hash, std::move(encoded_filter)); | ||
uint256 result; | ||
CHash256().Write(encoded_filter).Finalize(result); | ||
if (result != hash) return error("Checksum mismatch in filter decode."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to include pos
in the error message? Also, not sure if "filter decode" is the most accurate description? What about e.g. "Checksum mismatch for filter loaded from <pos>"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK e734228, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code.
@stickies-v maybe you'd like to follow up here? |
If someone follows up, this comment and patch might be useful: #24832 (comment) |
I have working code here: https://github.com/kcalvinalvin/bitcoin/commits/2020-06-14-blockfilterindex-checksums-2. I'll make a follow up PR |
… the filter from disk. e734228 Update GCSFilter benchmarks (Calvin Kim) aee9a81 Add GCSFilterDecodeSkipCheck benchmark (Patrick Strateman) 299023c Add GCSFilterDecode and GCSBlockFilterGetHash benchmarks. (Patrick Strateman) b0a53d5 Make sanity check in GCSFilter constructor optional (Patrick Strateman) Pull request description: This PR picks up the abandoned bitcoin#19280 BlockFilterIndex was depending on `GolombRiceDecode()` during the filter decode to sanity check that the filter wasn't corrupt. However, we can check for corruption by ensuring that the encoded blockfilter's hash matches up with the one stored in the index database. Benchmarks that were added in bitcoin#19280 showed that checking the hash is much faster. The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method. ``` | ns/elem | elem/s | err% | ins/elem | bra/elem | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:---------- | 531.40 | 1,881,819.43 | 0.3% | 3,527.01 | 411.00 | 0.2% | 0.01 | `DecodeCheckedGCSFilter` | 258,220.50 | 3,872.66 | 0.1% | 2,990,092.00 | 586,706.00 | 1.7% | 0.01 | `DecodeGCSFilter` | 13,036.77 | 76,706.09 | 0.3% | 64,238.24 | 513.04 | 0.2% | 0.01 | `BlockFilterGetHash` ``` ACKs for top commit: mzumsande: Code Review ACK e734228 theStack: Code-review ACK e734228 stickies-v: ACK e734228 ryanofsky: Code review ACK e734228, with caveat that I mostly paid attention to the main code, not the changes to the benchmark. Only changes since last review were changes to the benchmark code. Tree-SHA512: 02b86eab7b554e1a57a15b17a4d6d71faa91b556c637b0da29f0c9ee76597a110be8e3b4d0c158d4cab04af0623de18b764837be0ec2a72afcfe1ad9c78a83c6
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
Post-merge re-ACK e734228, thanks for updating. |
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
BlockFilter::Unserialize is only used in the test code. Moving it to the tests allows for the removal of the optional sanity check in the GCSFilter and BlockFilter constructor, resulting in simpler code. This is a follow up to bitcoin#24832.
This PR picks up the abandoned #19280
BlockFilterIndex was depending on
GolombRiceDecode()
during the filter decode to sanity check that the filter wasn't corrupt. However, we can check for corruption by ensuring that the encoded blockfilter's hash matches up with the one stored in the index database.Benchmarks that were added in #19280 showed that checking the hash is much faster.
The benchmarks were changed to nanobench and the relevant benchmarks were like below, showing a clear win for the hash check method.