Skip to content

Empty SubString should contain the correct offset #52877

Open
@mnemnion

Description

julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa29* (2023-11-14 08:46 UTC)
Build Info:
  Built by Homebrew (v1.9.4)
[...]

Platform Info:
  OS: macOS (arm64-apple-darwin21.6.0)
[...]

Expected Behavior

julia> @views "12345"[4:4]
"4"

julia> ans.offset
3

julia> @views "12345"[4:3]
""

julia> ans.offset
3

Observed Behavior

julia> @views "12345"[4:3]
""

julia> ans.offset
0

Problem

This breaks the expectation that one can find the actual range of codeunits represented by a SubString. I'm writing a parser, and empty captures are an important part of parsing (for my purposes, I want to know e.g. the location of an optional parenthesis whether or not it exists).

This is the offending line, I see no reason it could be return new(s, j, 0) since a zero-byte codepoint is always valid (or equally invalid, if you prefer). It might require changes to some of the methods, I haven't actually tried the change (I have yet to set up a dev environment for Julia, although I would like to).

It should be possible to change this without breaking userspace. isempty is already uniquely determined by the value of .ncodeunits:

julia> emptysub = @views "12345"[4:3]
""

julia> isempty(emptysub)
true

julia> emptysub.ncodeunits
0

julia> @views "12345"[4:4].ncodeunits
1

And I would hope any code which messed around with the internals of a SubString (which is not a public interface, hence not subject to SemVer) would use the value of ncodeunits, or more likely isempty, to determine an empty SubString.

I would prefer not to turn my Vector{SubString} into a Union type just to store the offset of an empty SubString as an Int, although I'll need to do that anyway due to current behavior and keep it for compatibility. It would be rather useful if future versions of Julia didn't do this.

This is surprising behavior; I recognize that SubString internals are undocumented/private but (other than hopefully fixing this) they're unlikely to change, and the ability to reconstruct the range of a @view from its structure is quite valuable.

Workaround?

I tried handrolling one using Val(:shift) but for whatever reason this wasn't effective

julia> handrolled = SubString("12345", 3, 0, Val(:noshift))
MethodError: no method matching SubString(::String, ::Int64, ::Int64, ::Val{:noshift})

In any case, thank you for your hard work and consideration.

Metadata

Assignees

No one assigned

    Labels

    bugIndicates an unexpected problem or unintended behaviorstrings"Strings!"

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions