Skip to content

Commit

Permalink
Merge pull request #4241 from jamshark70/topic/findRegexpStackFix
Browse files Browse the repository at this point in the history
sclang: Fix 'findRegexp' empty-result case
  • Loading branch information
mossheim authored Jan 20, 2019
2 parents ed3f9f1 + 48f8879 commit a9f74f0
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 9 deletions.
35 changes: 28 additions & 7 deletions HelpSource/Classes/String.schelp
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,9 @@ code::

subsection:: Regular expressions

Note the inversion of the arguments:
The String class provides access to the boost library's regular expression functions. Boost's default uses Perl settings. (Currently, there is no hook to override the regex style.) Syntax details may be found at link::https://www.boost.org/doc/libs/1_69_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html::.

Note carefully the argument order:

List::
## Code::regexp.matchRegexp(stringToSearch)::
Expand All @@ -347,7 +349,7 @@ List::
Code::findRegexp:: follows the pattern established by link::Classes/String#-find::, where the receiver is the string to be searched. Code::matchRegexp:: follows the pattern of link::Reference/matchItem::, where the receiver is the pattern to match and the first argument is the object to be tested. This is a common source of confusion, but it is based on this precedent.

method::matchRegexp
POSIX regular expression matching. Returns true if the receiver (a regular expression pattern) matches the string passed to it. The strong::start:: is an offset where to start searching in the string (default: 0), strong::end:: where to stop.
Perl regular expression matching (see link::Classes/String#Regular expressions::). Returns true if the receiver (a regular expression pattern) matches the string passed to it. The strong::start:: is an offset where to start searching in the string (default: 0), strong::end:: where to stop.

note::This is code::regexp.matchRegexp(stringToSearch):: and not the other way around! See above: link::Classes/String#Regular expressions::.::

Expand All @@ -372,16 +374,31 @@ code::
::

method::findRegexp
POSIX regular expression search.
Perl regular expression search (see link::Classes/String#Regular expressions::). This method searches exhaustively for matches and collects them into an array of pairs, in the format code::[character index, matching string]::.

"Leftmost largest match": As in most flavors of regular expressions, code::*:: and code::+:: are greedy; if it is possible to have more than one overlapping match for a part of the regular expression, the match list will include only the leftmost and largest of them. In code::"foobar".findRegexp("o+")::, code::"o+":: may potentially have three matches: code::"o":: at index 1 (second character), code::"o":: at index 2, and code::"oo":: at index 1. code::findRegexp:: will return only the last of these (code::"oo"::), because it begins in the leftmost-possible matching position, and it is the longest possible match at that position.

Note, though, that parentheses for grouping (a "marked sub-expression" or "capturing group") will produce a separate result: code::"aaa".findRegexp("(a+)");:: appears to produce duplicated results code::[ [ 0, aaa ], [ 0, aaa ] ]::, but this is because the first match is for the parentheses and the second is for code::a+::.

To see the marked sub-expression results more clearly, consider:

code::
"foobar".findRegexp("(o*)(bar)");
-> [ [ 1, oobar ], [ 1, oo ], [ 3, bar ] ]
::

code::"oobar":: matches the entire regular expression. code::"oo":: and code::"bar":: match the first and second parenthesized sub-expressions, respectively.

code::
"foobar".findRegexp("o*bar");
"32424 334 /**aaaaaa*/".findRegexp("/\\*\\*a*\\*/");
"foobar".findRegexp("(o*)(bar)");
"aaaabaaa".findAllRegexp("a+");
"aaaabaaa".findRegexp("a+");
::

Returns:: A nested array, where each sub-array is a pair, code::[character index, matching string]::. If there are no matches, an empty array.

method::findAllRegexp
Like link::#-findAll::, but use regular expressions. So unlike findRegexp, it will just return the indices of the
Like link::#-findAll::, but use regular expressions (see link::Classes/String#Regular expressions::). Unlike findRegexp, it returns only the indices of the matches: code::string.findAllRegexp(regexp):: returns the same as code::string.findRegexp(regexp).flop.at(0)::.

code::
"foobar".findAllRegexp("o*bar");
Expand All @@ -390,8 +407,10 @@ code::
"aaaabaaa".findAllRegexp("a+");
::

Returns:: An array of integer character indices pointing to all the possible matches.

method::findRegexpAt
Match a regular expression at the given offset, returning the match and the length of the match in an Array, or nil if it doesn't match.
Match a regular expression (see link::Classes/String#Regular expressions::) at the given offset, returning the match and the length of the match in an Array, or nil if it doesn't match.
The match must begin right at the offset.

code::
Expand All @@ -405,6 +424,8 @@ code::
"foobaroob".findRegexpAt("o*b+", 7); // [ ob, 2 ]
::

Returns:: An array code::[matching string, length]:: if a match is found at the specified offset; code::nil:: if the offset doesn't match.

subsection:: Searching strings

method::find
Expand Down
2 changes: 0 additions & 2 deletions lang/LangPrimSource/PyrStringPrim.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -413,8 +413,6 @@ static int prString_FindRegexp(struct VMGlobals *g, int numArgsPushed)
++g->sp; // advance the stack to avoid overwriting receiver
SetObject(g->sp, result_array); // push result to make reachable

if( !match_count ) return errNone;

for (int i = 0; i < match_count; ++i )
{
int pos = matches[i].pos;
Expand Down
14 changes: 14 additions & 0 deletions testsuite/classlibrary/TestString.sc
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,18 @@ TestString : UnitTest {
this.assertEquals(result, expected);
}

test_findRegexp_nonEmptyResult {
var result = "two words".findRegexp("[a-zA-Z]+");
this.assertEquals(
result,
[[0, "two"], [4, "words"]],
"`\"two words\".findRegexp(\"[a-zA-Z]+\")` should return a nested array of indices and matches"
)
}

test_findRegexp_emptyResult {
var result = "the quick brown fox".findRegexp("moo");
this.assertEquals(result, Array.new, "Non-matching findRegexp should return empty array");
}

}

0 comments on commit a9f74f0

Please sign in to comment.