Empty SubString should contain the correct offset #52877
Description
julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa29* (2023-11-14 08:46 UTC)
Build Info:
Built by Homebrew (v1.9.4)
[...]
Platform Info:
OS: macOS (arm64-apple-darwin21.6.0)
[...]
Expected Behavior
julia> @views "12345"[4:4]
"4"
julia> ans.offset
3
julia> @views "12345"[4:3]
""
julia> ans.offset
3
Observed Behavior
julia> @views "12345"[4:3]
""
julia> ans.offset
0
Problem
This breaks the expectation that one can find the actual range of codeunits represented by a SubString. I'm writing a parser, and empty captures are an important part of parsing (for my purposes, I want to know e.g. the location of an optional parenthesis whether or not it exists).
This is the offending line, I see no reason it could be return new(s, j, 0)
since a zero-byte codepoint is always valid (or equally invalid, if you prefer). It might require changes to some of the methods, I haven't actually tried the change (I have yet to set up a dev environment for Julia, although I would like to).
It should be possible to change this without breaking userspace. isempty
is already uniquely determined by the value of .ncodeunits
:
julia> emptysub = @views "12345"[4:3]
""
julia> isempty(emptysub)
true
julia> emptysub.ncodeunits
0
julia> @views "12345"[4:4].ncodeunits
1
And I would hope any code which messed around with the internals of a SubString (which is not a public interface, hence not subject to SemVer) would use the value of ncodeunits
, or more likely isempty
, to determine an empty SubString.
I would prefer not to turn my Vector{SubString}
into a Union type just to store the offset of an empty SubString
as an Int
, although I'll need to do that anyway due to current behavior and keep it for compatibility. It would be rather useful if future versions of Julia didn't do this.
This is surprising behavior; I recognize that SubString internals are undocumented/private but (other than hopefully fixing this) they're unlikely to change, and the ability to reconstruct the range of a @view
from its structure is quite valuable.
Workaround?
I tried handrolling one using Val(:shift) but for whatever reason this wasn't effective
julia> handrolled = SubString("12345", 3, 0, Val(:noshift))
MethodError: no method matching SubString(::String, ::Int64, ::Int64, ::Val{:noshift})
In any case, thank you for your hard work and consideration.