-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocation-free hex escape parsing in UnicodeSet parsing #3725
Allocation-free hex escape parsing in UnicodeSet parsing #3725
Conversation
🎉 All dependencies have been resolved ! |
0190186
to
090eb79
Compare
let first_offset = self.must_peek_index()?; | ||
let end_offset = self.validate_hex_digits(min, max)?; | ||
|
||
// safety: validate_hex_digits ensures that chars (including the last one) are ascii hex digits, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this isn't about "safety" but about not panicking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, removed all other mentions in this file as well
// safety: validate_hex_digits ensures that chars (including the last one) are ascii hex digits, | ||
// which are all exactly one UTF-8 byte long, so slicing on these offsets always respects char boundaries | ||
#[allow(clippy::indexing_slicing)] | ||
let hex_source = &self.source[first_offset..=end_offset]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: if you make max
a const-generic, you can keep the digits you've seen on the stack and don't need to hold on to source
. Alternatively, you can merge this with validate_hex_digit
and calculate the value while iterating with something like curr = curr * 16 + c.to_digit(16)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the version that keeps digits on the stack work? IIUC, if I want to use a std parsing function I need a UTF-8 slice, but my iterator only gives me char
s.
Do you think holding onto source
is a big downside? If I understand correctly, even in the #3550 - polished version of this API the user will never have to deal with a struct that holds onto source
(USParser::parse
takes source and returns the CPILASL)
In any case, if we want to avoid the two-pass approach, I think manually implementing the hex parsing and doing it on the fly as you suggested is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the stack based approach is not great, you'd have a [u8; MAX]
buffer that you'd unsafely interpret as a str
. Holding on to source
seems fine
Avoids unnecessary allocations when parsing hex digits, instead creates a subslice from the source &str.
Part of #3684
Depends on #3670
(cc @younies)