-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: reduce unsafe uint256S usage #30569
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the breaking changes. -noassumevalid
seems common https://github.com/search?q=noassumevalid&type=code
c0d508e
to
b38a259
Compare
Thanks a lot for the quick approach feedback, @maflcko. Force pushed to minimize breaking changes by repurposing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clearing these up, please see my observations.
b38a259
to
d7f866f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5368fd4
to
abf9661
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will continue reviewing a bit later, I only had time for these so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, please see my remaining suggestions
src/test/fuzz/hex.cpp
Outdated
const std::vector<unsigned char> data = ParseHex(random_hex_string); | ||
const std::vector<std::byte> bytes{ParseHex<std::byte>(random_hex_string)}; | ||
assert(AsBytes(Span{data}) == Span{bytes}); | ||
const std::string hex_data = HexStr(data); | ||
if (IsHex(random_hex_string)) { | ||
assert(ToLower(random_hex_string) == hex_data); | ||
} | ||
(void)IsHexNumber(random_hex_string); | ||
(void)TrySanitizeHexNumber(random_hex_string, result_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand it correctly that the point of this kind of testing (i.e. just calling the method with random values without validating the result) is to make sure we don't have memory problems, don't throw unexpected exceptions, etc?
Since we seem to have many hex related methods, would it make sense to compare their outputs, to make sure we at least have internal consistency?
(void)TrySanitizeHexNumber(random_hex_string, result_size); | |
auto sanitized_hex = TrySanitizeHexNumber(random_hex_string, result_size); | |
if (sanitized_hex) { | |
assert(IsHex(*sanitized_hex)); | |
assert(result_size < 0 || sanitized_hex->length() == static_cast<size_t>(result_size)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is to make sure we don't have memory problems, don't throw unexpected exceptions, etc?
That's my understanding too.
would it make sense to compare their outputs, to make sure we at least have internal consistency?
Sounds reasonable, but I think I'll leave that for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out to other pr: #30618
abf9661
to
d045fc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @paplorinc, force pushed to address all outstanding comments. Style-only changes:
- introduced
final_size
var to avoid typecasting between (un)signed types and some related cleanup - removed hardcoded
64
value - cleaned up the
-noassumevalid
logic a bit
src/test/fuzz/hex.cpp
Outdated
const std::vector<unsigned char> data = ParseHex(random_hex_string); | ||
const std::vector<std::byte> bytes{ParseHex<std::byte>(random_hex_string)}; | ||
assert(AsBytes(Span{data}) == Span{bytes}); | ||
const std::string hex_data = HexStr(data); | ||
if (IsHex(random_hex_string)) { | ||
assert(ToLower(random_hex_string) == hex_data); | ||
} | ||
(void)IsHexNumber(random_hex_string); | ||
(void)TrySanitizeHexNumber(random_hex_string, result_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is to make sure we don't have memory problems, don't throw unexpected exceptions, etc?
That's my understanding too.
would it make sense to compare their outputs, to make sure we at least have internal consistency?
Sounds reasonable, but I think I'll leave that for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK d045fc7
Note that I don't know the second order effects of constraining previous public input values, somebody else should assess that part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK d045fc7
Thanks for chipping away at making hex string handling more robust!
Some minor disagreements + ignorable nits. The most significant disagreement in this thread: #30569 (comment)
Tested:
$ RANDOM_CTX_SEED=1231231231231231231231231231231231231231231231231231231231312312 src/test/test_bitcoin
$ test/functional/feature_assumevalid.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a nit. Feel free to ignore.
Otherwise this looks good on a first glance.
53d8a62
to
ee63e21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force-pushed to address all outstanding review comments, mainly:
- avoid string re-allocation
- cleaned up
util/random.cpp
a bit more - removed unnecessary docstring
ee63e21
to
855784d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK cf88a55
I really like the TrySanitizeHexNumber
-> FromUserHex
change!
Sorry for not understanding that you had it in the works with my prior suggestion.
Only ignorable nits left.
(Still fails at the end of make check
due to unlucky base commit).
* Designed to be used when dealing with user input. | ||
*/ | ||
template <class uintN_t> | ||
std::optional<uintN_t> FromUserHex(std::string_view input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Thread https://github.com/bitcoin/bitcoin/pull/30569/files#r1725693175)
I prefer the current uint256::FromUserHex
over uint256::FromHexUser
(the latter makes me think of sorcerers). (uint256::FromHexLenient
is okay to).
cf88a55
to
81554aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force-pushed (twice) to change the approach to address @achow101's, @ryanofsky's and @hodlinator's concerns about being too strict about user hex input for -assumevalid
and RANDOM_CTX_SEED
. Thanks again for sharing your concerns about breaking changes (and the detailed review), in hindsight I think this is indeed the better approach.
Specifically, these last two force-pushes:
- introduces a
uint256::FromUserHex()
to allow for lenient user input parsing and
does away with theTrySanitizeHexNumber
, as suggested by l0rinc and ryanofsky - use the new
uint256::FromUserHex()
function for-minimumchainwork
,-assumevalid
andRANDOM_CTX_SEED
parsing, minimizing the behaviour change in this PR and allowing 0x prefixes and too-short-input in all 3 cases. The main behaviour change now introduced by this PR is limited to more explicit handling of invalid hex characters and too-long-input, which I believe won't be controversial. - remove the now unused
IsHexNumber()
. Most of its unit tests have been migrated touint256_tests/from_user_hex
. - improved testing in
validation_chainstatemanager_tests
:- refactored
SetOptsFromArgs
to be templated and reusable, and allow for better error reporting by moving the BOOST_ calls to the edges as much as ergonomically possible - added unit tests for invalid-hex-char or too-many-char cases for -minimumchainwork and -assumevalid
- refactored
- added and cleaned up some of the `uint256_tests/from_user_hex tests 1, 2, e
- terminology: consistently use "hex digits" whenever the character is indeed a [0-9, a-f] value, and "(non-hex) character" when it isn't or we're not sure yet.
* Designed to be used when dealing with user input. | ||
*/ | ||
template <class uintN_t> | ||
std::optional<uintN_t> FromUserHex(std::string_view input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing against any of the suggested names, but personally I still find FromUserHex
to be the most intuitive (and it seems at least 2 other reviewers like it as well), so I'm going to stick with that if that's okay?
I like it a lot more now, thanks for considering the feedbacks. The only part that still bugs me a bit is the naming of ACK 81554aa |
No problem at all, I decided to give this PR a bit of a rest to focus on reviewing and aligning with related PRs, and think about the suggestions made here, hence the delay in updating this. Will quickly address feedback again from here on.
I think it's a valid concern, and I agree that generally functions should be named in line with what they do, not with how they're used - and Thank you both for your continued review! |
Code review ACK 81554aa. Thanks for all the changes. This seems simpler and more backwards compatible now. re: #30569 (comment) In PR description:
I think it would be clearer and more consistent with the rest of the description if it said "test: the optional |
* Designed to be used when dealing with user input. | ||
*/ | ||
template <class uintN_t> | ||
std::optional<uintN_t> FromUserHex(std::string_view input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #30569 (comment)
re: #30569 (comment)
Maybe I have a slightly different perspective because I think it can be good to name functions after how they are intended to be used, instead of what they do, depending on what's more relevant to callers. In this case, though, I do think the name describes what it does: parse hex input from a user. The assumption is that callers should not make idiosyncratic choices about what user input to accept, and there should be a common standard.
I don't like name FromHexLenient
so much because it sounds like a negative thing that should be avoided, and doesn't provide much more information about what the function does either.
If we do want the function name to more literally describe what it does, I think FromHexNumber
could work because it's conventional for hex numbers to have 0x prefixes and not need to be padded with 0's.
But my vote would be for FromUserHex over FromHexNumber over FromHexLenient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 81554aa
git range-diff master cf88a55e97719aabd62f0b608df0800fef8304de 81554aac80bf2270db977c110c37acc7e8034194
Only changes BOOST_AUTO_TEST_CASE(from_user_hex)
tests, error strings, commit messages and revives the LogPrintf
-> LogInfo
update.
FromUserHex will be used in future commits to construct uint256 instances from user hex input without being unnecessarily restrictive on formatting by allowing 0x-prefixed input that is shorter than 64 characters.
Removes dependency on unsafe and deprecated uint256S. This makes parsing more strict, by returning an error when the input contains more than 64 hex digits.
The relevant unit tests have been incorporated in uint256_tests/from_user_hex in a previous commit.
Removes dependency on unsafe and deprecated uint256S. This makes parsing more strict, by returning an error when the input contains non-hex characters, or when it contains more than 64 hex digits. Also make feature_assumevalid.py more robust by using CBlock.hash which is guaranteed to be 64 characters long, as opposed to the variable-length hex(CBlock.sha256)
Removes dependency on unsafe and deprecated uint256S. This makes parsing more strict, by requiring RANDOM_CTX_SEED to be a string of up to 64 hex digits (optionally prefixed with "0x"), whereas previously any string would be accepted, with non-hex characters silently ignored and input longer than 64 characters (ignoring "0x" prefix) silently trimmed. Can be tested with: $ RANDOM_CTX_SEED=z ./src/test/test_bitcoin --log_level=all --run_test=timeoffsets_tests/timeoffsets_warning -- -printtoconsole=1 | grep RANDOM_CTX_SEED RANDOM_CTX_SEED must consist of up to 64 hex digits ("0x" prefix allowed), it was set to: 'z'. Co-Authored-By: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
81554aa
to
18d65d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force-pushed to address all review comments, limited to nits and tests so should be an easy re-review:
- grammar nit for
RANDOM_CTX_SEED
- added
from_user_hex
test case to cover more > 64 char cases and cleaned up the test a bit by replacingvalid_hex_65
withvalid_hex_64
which should be less confusing - re-added mixed case testing in
from_user_hex
- undid unintentional newline removal
I think it would be clearer and more consistent with the rest of the description if it said "test: the optional RANDOM_CTX_SEED env var will now cause tests to abort when it contains invalid hex characters, or when it is longer than 64 hex digits"
Thanks, PR description updated with your suggestion.
re-ACK 18d65d2
Changes: Grammar, added&adjusted tests (passed), undid unintentional new line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 18d65d2. Very nice change that cleans up the API, adds checking for invalid values, makes parsing of values more consistent, and adds test coverage.
Just small test, error message, and comment changes since last review
While I'm not a fan of mixed casing, I agree that it's probably outside the scope of current change. ACK 18d65d2 |
ACK 18d65d2 |
43cd83b test: move uint256_tests/operator_with_self to arith_uint256_tests (stickies-v) c6c994c test: remove test-only uint160S (stickies-v) 62cc465 test: remove test-only uint256S (stickies-v) adc00ad test: remove test-only arith_uint256S (stickies-v) f51b237 refactor: rpc: use uint256::FromHex for ParseHashV (stickies-v) Pull request description: _Continuation of #30569._ Since fad2991, `uint256S()` has been [deprecated](fad2991#diff-800776e2dda39116e889839f69409571a5d397de048a141da7e4003bc099e3e2R138) because it is less robust than the `base_blob::FromHex()` introduced in #30482. Specifically, it tries to recover from length-mismatches, recover from untrimmed whitespace, 0x-prefix and garbage at the end, instead of simply requiring exactly 64 hex-only characters. (see also #30532) This PR removes `uint256S()` (and `uint160S()`) completely, with no non-test behaviour change. Specifically, the main changes in this PR are: - the (minimal) last non-test usage of `uint256S()` in `ParseHashV()` is removed without behaviour change, which can partially be verified by cherry-picking and/or modifying [this test commit](stickies-v@1f2b0fa)). - the test usage of `uint{160,256}S()` is removed, largely replacing it with `uint{160,256}::FromHex()` where applicable, potentially modifying the test by removing non-hex characters or dropping the test entirely if removing non-hex characters makes it redundant - the now unused `uint{160,256}S()` functions are removed completely. - unit test coverage on converting `uint256` <-> `arith_uint256` through `UintToArith256()` and `ArithToUint256()` is beefed up, and `arith_uint256` tests are moved to `arith_uint256_tests.cpp`, removing the `uint256_tests.cpp` dependency on `uint256h`, mirroring how the code is structured. _Note: `uint256::FromUserHex()` exists to more leniently construct uint256 from user input, allowing "0x" prefixes and too-short-input, as safer alternative to `uint256S()` where necessary._ ACKs for top commit: l0rinc: reACK 43cd83b hodlinator: re-ACK 43cd83b ryanofsky: Code review ACK 43cd83b. Only code change is a small refactoring which looks good. The rest of the PR is all test changes, which I only lightly reviewed, but seem to be positive and do what's described Tree-SHA512: 48147a4c6af671597df0f72c1b477ae4631cd2cae4645ec54d0e327611ff302c9899e344518c81242cdde82930f6ad23a3a7e6e0b80671816e9f457b9de90a5c
…se FromUserHex fuzz feature coverage 1eac96a Compare FromUserHex result against other hex validators and parsers (Lőrinc) 1994786 Use BOOST_CHECK_EQUAL for optional, arith_uint256, uint256, uint160 (Lőrinc) 743ac30 Add std::optional support to Boost's equality check (Lőrinc) Pull request description: Enhanced `FromUserHex` coverage by: * Added `std::optional` support to `BOOST_CHECK_EQUAL`, allowing direct comparisons of `std::optional<T>` with other `T` expected values. * Increased fuzz testing for hex parsing to validate against other hex validators and parsers. ---- * Use BOOST_CHECK_EQUAL for #30569 (comment) arith_uint256, uint256, uint160 Example error before: > unknown location:0: fatal error: in "validation_chainstatemanager_tests/chainstatemanager_args": std::bad_optional_access: bad_optional_access test/validation_chainstatemanager_tests.cpp:781: last checkpoint after: > test/validation_chainstatemanager_tests.cpp:801: error: in "validation_chainstatemanager_tests/chainstatemanager_args": check set_opts({"-assumevalid=0"}).assumed_valid_block == uint256::ZERO has failed [std::nullopt != 0000000000000000000000000000000000000000000000000000000000000000] ACKs for top commit: stickies-v: re-ACK 1eac96a ryanofsky: Code review ACK 1eac96a. Only changes since last review were auto type and fuzz test tweaks. hodlinator: ACK 1eac96a Tree-SHA512: f1d2c65f0ee4e97830700be5b330189207b11ed0c89a8cebf0f97d43308402a6b3732e10130c79a0c044f7d2eeabfb5359990825aadf02c4ec19428dcd982b00
Since fad2991,
uint256S
has been deprecated because it is less robust than thebase_blob::FromHex()
introduced in the same PR. Specifically, it tries to recover from length-mismatches, recover from untrimmed whitespace, 0x-prefix and garbage at the end, instead of simply requiring exactly 64 hex-only characters. (see also #30532)This PR carves out the few
uint256S
callsites that may potentially prove a bit more controversial to change because they deal with user input and backwards incompatible behaviour change.The main behaviour change introduced in this PR is:
-minimumchainwork
will raise an error when input is longer than 64 hex digits-assumevalid
will raise an error when input contains invalid hex characters, or when it is longer than 64 hex digitsAfter this PR, the remaining work to remove
uint256S
completely is almost entirely mechanical and/or test related. I will open that PR once #30560 is merged because it builds on that.