Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add variant of regex.jl:match() that updates idx #51546

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mgkuhn
Copy link
Contributor

@mgkuhn mgkuhn commented Oct 2, 2023

Allow idx to be a Ref{Int}, so it can be updated easily to always point at the first character not yet matched. This is particularly useful e.g. when writing parsers that repeatedly call match() as a token scanner on the same string, allowing them to start the next match right after the previous one, usually with the regex anchored to the idx position with \G.

fixes #51429

Allow `idx` to be a `Ref{Int}`, so it can be updated easily to always
point at the first character not yet matched. This is particularly
useful e.g. when writing parsers that repeatedly call `match()` as a token
scanner on the same string, allowing them to start the next match
right after the previous one, usually with the regex anchored to the
`idx` position with `\G`.

fixes JuliaLang#51429
@brenhinkeller brenhinkeller added strings "Strings!" feature Indicates new feature / enhancement requests labels Oct 3, 2023
@ararslan
Copy link
Member

ararslan commented Oct 3, 2023

This seems like a rather odd API; I don't know offhand of any other Base functions which have C-style, Ref updating side effects that aren't just C calls.

An alternative approach could be to add a method to match that takes a RegexMatch in the index position, like

match(re::Regex, str::Union{String,SubString}, prev::RegexMatch) =
    match(re, str, prev.offset + ncodeunits(prev.match))

That doesn't introduce hidden side effects but it's also kind of unclear whether it's materially more ergonomic than the status quo.

@aplavin
Copy link
Contributor

aplavin commented Oct 15, 2023

Note that in the next julia version (1.10) we'll already have this:

julia> s = "abc def"
"abc def"

julia> m = match(r"\w+", s)
RegexMatch("abc")

julia> match(r"\w+", s, parentindices(m.match)[1][end] + 1)
RegexMatch("def")

Specifically, SubString support is added to the parentindices function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make match() easier to call again
4 participants