Skip to content

Commit

Permalink
Help: String:findRegexp: Clarify leftmost-largest and subexpressions
Browse files Browse the repository at this point in the history
  • Loading branch information
jamshark70 committed Jan 17, 2019
1 parent 5e8a13f commit 48f8879
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions HelpSource/Classes/String.schelp
Original file line number Diff line number Diff line change
Expand Up @@ -376,14 +376,22 @@ code::
method::findRegexp
Perl regular expression search (see link::Classes/String#Regular expressions::). This method searches exhaustively for matches and collects them into an array of pairs, in the format code::[character index, matching string]::.

"Leftmost largest match": As in the regular expression standard, code::*:: and code::+:: are greedy; if it is possible to have more than one overlapping match for a part of the regular expression, the match list will include only the leftmost and largest of them. In code::"foobar".findRegexp("(o*)(bar)");:: below, code::"o*":: may potentially have three matches: code::"o":: at index 1 (second character), code::"o":: at index 2, and code::"oo":: at index 1. code::findRegexp:: will return only the last of these (code::"oo"::), because it begins in the leftmost-possible matching position, and it is the longest possible match at that position.
"Leftmost largest match": As in most flavors of regular expressions, code::*:: and code::+:: are greedy; if it is possible to have more than one overlapping match for a part of the regular expression, the match list will include only the leftmost and largest of them. In code::"foobar".findRegexp("o+")::, code::"o+":: may potentially have three matches: code::"o":: at index 1 (second character), code::"o":: at index 2, and code::"oo":: at index 1. code::findRegexp:: will return only the last of these (code::"oo"::), because it begins in the leftmost-possible matching position, and it is the longest possible match at that position.

Note, though, that parentheses for grouping (a "marked sub-expression" or "capturing group") will produce a separate result: code::"aaa".findRegexp("(a+)");:: appears to produce duplicated results code::[ [ 0, aaa ], [ 0, aaa ] ]::, but this is because the first match is for the parentheses and the second is for code::a+::.

To see the marked sub-expression results more clearly, consider:

code::
"foobar".findRegexp("(o*)(bar)");
-> [ [ 1, oobar ], [ 1, oo ], [ 3, bar ] ]
::

code::"oobar":: matches the entire regular expression. code::"oo":: and code::"bar":: match the first and second parenthesized sub-expressions, respectively.

code::
"foobar".findRegexp("o*bar");
"32424 334 /**aaaaaa*/".findRegexp("/\\*\\*a*\\*/");
"foobar".findRegexp("(o*)(bar)");
"aaaabaaa".findRegexp("a+");
::

Expand Down

0 comments on commit 48f8879

Please sign in to comment.