-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: Add RegExp v
flag with set notation and properties of strings
#2418
Conversation
v
flag with set notation and properties of stringsv
flag with set notation and properties of strings
3d0c24c
to
7a79833
Compare
86df176
to
49a291a
Compare
This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review? |
@FrankYFTang we don't use separated html files; can you inline the table? |
We thought that it's ok for stage 3 to still have editorial notes and explainers. We intend to eventually turn many of them into permanent NOTEs or whatever is a reasonable format, based on feedback. Ok? |
Sure, that's ok. So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative. |
I believe that we covered nearly everything, but I plan to go through it myself once more with a fine-tooth comb. I hope that this will not block stage 3 reviewers from reviewing and providing feedback. |
(Generally those things are fine in the proposal, but don't go in the spec PR) |
See Notice these table are separated html files because for every new verison of Unicode standard @mathiasbynens update them. I am not sure he use some tool to generate them or by hand in the past. I just follow that |
What is |
We flip-flopped on the string literal syntax, and went back from This is one of a few things that we will note specially in next week's meeting. We did edit the readme to change example string literals to this syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some initial comments. I haven't done a thorough review yet, this was just a first pass.
spec.html
Outdated
If _UnicodeSets_ is *false*, then a CharSetElement is a character in the sense of the Pattern Semantics above. | ||
</li> | ||
<li> | ||
If _UnicodeSets_ is *true*, then a CharSetElement is either a character in the sense of the Pattern Semantics above, or it is a sequence of characters, that is, a string. This includes the empty String and strings with more than 1 character. A string of length 1 is the same as a single character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this type can hold sequences of multiple code points, it should be renamed to something other than CharSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the flags, it's a set of code units, or code points, or code points plus sequences of code points. The concept of "character" under the new flag encompasses single code points but is also sufficiently general for the logical concept of "character" which frequently includes things that are encoded in Unicode with sequences of multiple code points. (Of course not vice versa.)
Also, renaming it would affect parts of the spec that need not be touched for the normative changes here.
There may or may not be a better name for this. If we find one, can we make that change later, on its own (separate from these normative changes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is confusing enough that I would be quite uncomfortable landing it unless we've though looked hard for better names and failed to find one. I'll raise this at the editor call next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editors are agreed this will need a new name before landing the PR. That doesn't need to block anything until that point, though; it can still get stage 3.
(Note that we're not yet asking for Stage 3 advancement, although we are asking Stage 3 reviewers to start reviewing: tc39/agendas#1093) |
That'd be lovely, if you don't mind! Thanks. |
2c542b3
to
c8482ef
Compare
OK, rebased and adopted the conventions of #2531. Not certain I did it perfectly, but it seems to match up pretty well. |
c8482ef
to
b2499cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add
UnicodeSetsMode
toTerm
,Assertion
,QuantifiableAssertion
, andExtendedAtom
.
You added UnicodeSetsMode
to the LHS occurrences of those symbols, but not to their RHS occurrences.
Not sure about the prefix. ?
might work everywhere, but it might be clearer to use ~
when a RHS is guarded by [~UnicodeMode]
.
On closer examination, QuantifiableAssertion
and ExtendedAtom
are only used under [~UnicodeMode]
, so assuming that the combination ~UnicodeMode, +UnicodeSetsMode
never occurs, you don't need to add UnicodeSetsMode
to those two nonterminals, you can instead just change ?UnicodeSetsMode
to ~UnicodeSetsMode
in their RHSs.
Here's what I mean: mathiasbynens#26
I’ve rebased this and the esmeta check now passes (thanks to #3078). Are there any other blockers preventing this PR from being accepted? |
@mathiasbynens No, just waiting on review from one more editor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some relatively minor comments, of which only the CharSet one absolutely needs to be addressed before landing. Otherwise looks good to me.
A <dfn>CharSetElement</dfn> is one of the two following entities: | ||
<ul> | ||
<li> | ||
If _rer_.[[UnicodeSets]] is *false*, then a CharSetElement is a character in the sense of the Pattern Semantics above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this unbound alias, but I guess I don't have a better suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility would be: "If 'v' does/doesn't appear in the RegExp's flags, ..." although then "the RegExp" doesn't have an antecedent.
Maybe "In the context of a RegExp with/without a 'v' flag, ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still lgtm, thanks for the CharSetElement changes.
This enables the use of set notation, string literal syntax, and Unicode properties of strings in regular expressions. Proposal repo: https://github.com/tc39/proposal-regexp-set-notation Co-authored-by: Markus Scherer <markus.icu@gmail.com>
Thanks everyone! |
... prompted by tc39#2418 (comment) from `Atom :: CharacterClass` semantics to `AtomEscape :: CharacterClassEscape` semantics.
- In CompileAtom, take the wording-changes under `Atom :: CharacterClass` that were prompted by tc39#2418 (comment) and copy them to `AtomEscape :: CharacterClassEscape`. - In various operations, change more occurrences of 'element' to 'CharSetElement'. - In CompileAtom, change 2 occurrences of "which" to "that" because the usage is restrictive.
- In CompileAtom, take the wording-changes under `Atom :: CharacterClass` that were prompted by tc39#2418 (comment) and copy them to `AtomEscape :: CharacterClassEscape`. - In various operations, change more occurrences of 'element' to 'CharSetElement'. - In CompileAtom, change 2 occurrences of "which" to "that" because the usage is restrictive.
- In CompileAtom, take the wording-changes under `Atom :: CharacterClass` that were prompted by tc39#2418 (comment) and copy them to `AtomEscape :: CharacterClassEscape`. - In various operations, change more occurrences of 'element' to 'CharSetElement'. - In CompileAtom, change 2 occurrences of "which" to "that" because the usage is restrictive.
Proposal repo: https://github.com/tc39/proposal-regexp-set-notation
Preview link for the spec changes with inline diffs: https://arai-a.github.io/ecma262-compare/?pr=2418