-
Notifications
You must be signed in to change notification settings - Fork 36.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modernize util/strencodings and util/string: string_view
and optional
#24764
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK
Returning a string_view can be somewhat dangerous (in combination with auto? also when the original string goes out of scope, especially if it's a temporary), so giving these functions new names is a good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a string_view can be somewhat dangerous (in combination with auto? also when the original string goes out of scope, especially if it's a temporary), so giving these functions new names is a good idea.
A new name is good, but doesn't solve the problem. I think this can be fixed with an attribute: #20493
I think this can be merged, but I left a few nits and questions.
review ACK 6e1f147 💢
Show signature
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
review ACK 6e1f14780452d956503f1dbc3c91b17b44287f77 💢
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUghZAv+JPOBXiEZ9L1jCF9HPLDO1vGLEuvq6hw6EkaMSKd0Zn3UaiYvP98jPBMZ
HzTidM3yPWwDfLw3QVTlPCdt0J8wBAwOg5ZgjGRZ7fu1hn0SZ93q960giU68DbeJ
ZZCF4fAYPOkTqf1a06DnV7anglhEIG/JXMg+XESTvUGmtdKnP48sdwek4NZOw1h7
PcA+Ck38qlb68W38HQMwsQYr/aO2Ftc2Utw2GMv3Kkr7nD6kIwtuH9WOTWc8Kw2H
vr5NeRfQS1smm9rP+VrixKmCaC1h37r6NMuXDqh/MRNj77ONwmLDLWES/4bP9ioc
w814fQled+vIue+Mv5yBUNeGZmlOryQsC7wJwNkiRNKxG5H+bSn2KUKEtZAGjHpv
rfNI9X9nbDCvDQdHemrYCessHaCfGFCMeUDC4PWDZ0uNaZSC6RrU0y8pESeMOzxe
BJJxSpq0sHCOAVt8LKGyumBlma/k2+n9d6UNo3++K/am9Uxei4kFgLJ0QebpXKqm
NqMH+EA3
=/7qh
-----END PGP SIGNATURE-----
Agree, I just meant that putting a string_view-returning function as a drop-in replacement would have been a very bad idea, a new function with a new name means that people using it can reason about the risks anew. Sure, if an attribute can help avoid such problems more widely that's of course even better. |
review ACK e4092c5 🐤 Only changes:
Show signatureSignature:
|
Concept ACK - nits:
|
Yeah, I'll separate them (now that we figured out the issue with ignoring the pfInvalid flag in the http auth logic).
That would be incorrect. Hex numbers don't need to be a multiple of 2 characters (0xF is a valid hex number, e.g.). |
In addition, to make sure that no call site ignores the invalid decoding status, make the pf_invalid argument mandatory.
Base32/base64 are mechanisms for encoding binary data. That they'd decode to a string is just bizarre. The fact that they'd do that based on the type of input arguments even more so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 5ea072e only effective change is adding a "return false" 🌷
Show signature
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
re-ACK 5ea072ed08b43cdfdf8554597dcf8c1a51ef4dae only effective change is adding a "return false" 🌷
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUgUyQwAhIAVoAgFscYNynT/fORwnmC2JJczY8Zne+7IKHyQadrQGB3uGARyJceI
WDRjN7aLHHJwFmFZ/Lz4HGa5sZYwAUC/Vz9y2eX0nu2PInfZgdtNbx8JT0cx5wkX
03LWXiek/qiy6BeEctpE1nP37iSAqB4E18yEhCMqdLe+m7KCq/rTZN2XdF/R3DHK
j9mqyyy7M5dYQUE2dViBdqTydSrhGIhw2HqHXytP8/urfFB04wLXeJHNDZ6lRfZ1
zq28KVL2FOuFXdYTIAtdeCAFAXFgwiZ2lYik1+aftCBBO9BdBnmKqRD0/ERAC2u8
TfVPFMQECHbWxcxGwZGBjhNhcXGI2fVUF2g4SqJVssqPnG53GcSRvN6H0tcKDAH7
g1x6fqDibKBIUXcrTp8l5tlZYknN2wU0wq9M7U6uJIt5gGWOTlpYpbQsMIlzg6Ls
omRKFXjTXHvFmO8eilHMaMVk6qY/12MjoTyHZ1JwN7TWHRxH6Q9XNTFeq1VJE/qd
uqaqLwJe
=bejp
-----END PGP SIGNATURE-----
@@ -75,9 +85,12 @@ inline std::string MakeUnorderedList(const std::vector<std::string>& items) | |||
/** | |||
* Check if a string does not contain any embedded NUL (\0) characters | |||
*/ | |||
[[nodiscard]] inline bool ValidAsCString(const std::string& str) noexcept | |||
[[nodiscard]] inline bool ValidAsCString(std::string_view str) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name ValidAsCString
is now a bit dangerous. I'd assume that it returns true
when the argument is 0 terminated at the end (which was the case for the const std::string& str
argument), but with a std::string_view
argument this might not be the case.
So now even when ValidAsCString(str)
returns true
, it is not allowed to use e.g. strlen(str.data())
.
I would rename the method to something like ContainsNoNUL
to make it more clear what's checked, or keep the const std::string&
argument for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation, it only checks if a string has embedded \0
characters. That std::string is zero-terminated internally is an implementation detail, the caller is still responsible for passing c_str()
(and never .data()
) when passing it to a function that takes a C string.
it is not allowed to use e.g. strlen(str.data()).
I'd really frown on code like that in any case.
Edit: The only time it'd be remotely acceptable to pass .data()
as a C string would be if there is an embedded \0 in the array, which is exactly what this function avoids 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, maybe it's just me, but from the interface ValidAsCString(std::string_view str)
I'd guess that it returns true
when it actually finds a \0
somewhere, but it's exactly the opposite 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection to adding a scripted-diff to rename ValidAsCString
to ContainsNoNUL
. Either in this pull or as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
I am happy to pick this up if you are no longer working on it |
@MarcoFalke Go for it. |
Make use of
std::string_view
andstd::optional
in the util/{strencodings, string} files.This avoids many temporary string/vector objects being created, while making the interface easier to read. Changes include:
std::string_view
instead ofstd::string
.RemovePrefixView
andTrimStringView
which also returnstd::string_view
objects (the correspondingRemovePrefix
andTrimString
keep returning anstd::string
, as that's needed in many call sites still).std::string
fromDecodeBase32
andDecodeBase64
, but return vectors. Base32/64 are fundamentally algorithms for encoding bytes as strings; returningstd::string
from those (especially doing it conditionally based on the input arguments/types) is just bizarre.bool* pf_invalid
output argument pointer inDecodeBase32
andDecodeBase64
; return anstd::optional
instead.DecodeBase32
andDecodeBase64
more efficient by doing the conversion from characters to integer symbols on-the-fly rather than through a temporary vector.