Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for generically extracting unannotated string #55458

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tecosaur
Copy link
Contributor

As raised by Alexander Plavin, you may want to obtain a the underlying string of an AnnotatedString in non-String cases. However, there's no public API for doing so. Instead of just implementing this functionality for the String type, we can make it more generic.

Closes #55247.

@tecosaur tecosaur added strings "Strings!" backport 1.11 Change should be backported to release-1.11 labels Aug 11, 2024
@tecosaur tecosaur force-pushed the deannotate-str branch 2 times, most recently from 3216a44 to 162e37a Compare August 11, 2024 10:55
As raised by Alexander Plavin, you may want to obtain a the underlying
string of an AnnotatedString in non-String cases. However, there's no
public API for doing so. Instead of just implementing this functionality
for the String type, we can make it more generic.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@tecosaur
Copy link
Contributor Author

Thanks for giving this a check Jameson, I've had a little trouble with method ambiguities. Adding a few more specific methods wasn't enough (at least for the methods I tried), so I've taken a step back from your original suggestion (requiring the constructor to be of the same type). Hopefully, this is specific enough to avoid creating issues.

@tecosaur
Copy link
Contributor Author

Argh, there's still a bunch of ambiguity. @vtjnash I don't suppose you can think of any nice approaches to resolving them?

@KristofferC KristofferC mentioned this pull request Aug 13, 2024
68 tasks
KristofferC added a commit that referenced this pull request Aug 26, 2024
Backported PRs:
- [x] #54962 <!-- Add timing to precompile trace compile -->
- [x] #55180 <!-- compress jit debuginfo for easy memory savings -->
- [x] #54919 <!-- Fix annotated join with non-concrete eltype iters -->
- [x] #55013 <!-- [docs] change docstring to match code -->
- [x] #55017 <!-- TOML: Make `Dates` a type parameter -->
- [x] #54033 <!-- Fix a bug in `stack`'s DimensionMismatch error message
-->
- [x] #55242 <!-- fix at-main docstring to not code quote a compat box
-->
- [x] #55261 <!-- Make `jl_*affinity` tests more portable -->
- [x] #54736 <!-- specificity: ensure fast-path in `sub/eq_msp` handle
missing `UnionAll` wrapper correctly. -->
- [x] #55299 <!-- typeintersect: fix bounds merging during inner
`intersect_all`. -->
- [x] #55302 <!-- Add `lbt_forwarded_funcs()` to debug LBT forwarding
issues -->
- [x] #55148 <!-- Random: Mark unexported public symbols as public -->
- [x] #55303 <!-- avoid overflowing show for OffsetArrays around typemax
-->
- [x] #55317 <!-- Restrict argument to `isleapyear(::Integer)` -->
- [x] #55327 <!-- Profile: Fix stdlib paths -->
- [x] #55330 <!-- [libblastrampoline] Bump to v5.11.0 -->
- [x] #55310 <!-- Preserve structure in scaling triangular matrices by
NaN -->
- [x] #55329 <!-- mapreduce: don't inbounds unknown functions -->
- [x] #55356 <!-- Profile: close files when assembling heap snapshot -->
- [x] #55371 <!-- Fix tr for block SymTridiagonal -->
- [x] #55307 <!-- Make REPL.TerminalMenus public -->
- [x] #55362 <!-- inference: fix missing LimitedAccuracy markers -->
- [x] #55306 <!-- AllocOpt: Fix stack lowering where alloca continas
boxed and unboxed data -->
- [x] #55395 <!-- fix #55389: type-unstable `join` -->
- [x] #55226 <!-- re-add `unsafe_convert` for Reinterpret and Reshaped
array -->
- [x] #55405 <!-- handle unbound vars in NTuple fields -->
- [x] #55365 <!-- ml-matches: ensure all methods are included -->
- [x] #55428 <!-- codegen: move undef freeze before promotion point -->
- [x] #55419 <!-- `stale_cachefile`: handle if the expected cache file
is missing -->
- [x] #55470 <!-- Add push! implementation for AbstractArray depending
only on resize! -->
- [x] #55483 <!-- fix hierarchy level of "API reference" in `Dates`
documentation -->
- [x] #55268 <!-- simplify complex atanh and remove singularity
perturbation -->
- [x] #55441 <!-- fix Event to use normal Condition variable -->
- [x] #55413 <!-- subtyping: fast path for lhs union and rhs typevar -->
- [x] #55492 <!-- build: add missing dependencies for expmap -->
- [x] #55507 <!-- Fix fast getptls ccall lowering. -->
- [x] #55424 <!-- add missing clamp function for IOBuffer -->
- [x] #55504 <!-- Update symmetric docstring to reflect the type of uplo
-->
- [x] #55107 <!-- Make the memory GEP an inbounds GEP since the bounds
check has happened somewhere else -->
- [x] #55411 <!-- Vendor the terminfo database for use with
base/terminfo.jl -->
- [x] #55452 <!-- Do not load `ScopedValues` with `using` -->
- [x] #55407 <!-- Remove deprecated non string API for LLVM pass
pipeline and parse all options -->
- [x] #55461 <!-- 🤖 [master] Bump the StyledStrings stdlib from d7496d2
to f6035eb -->
- [x] #55433 <!-- Backport #55407
to 1.11 -->
- [x] #55225 <!-- [1.11 backport] trace-compile: don't generate
`precompile` statements for OpaqueClosure methods (#55072) -->
- [x] #55212 <!-- Make `Base.depwarn()` public -->
- [x] #552
- [x] #55052 <!-- Fix `(l/r)mul!` with `Diagonal`/`Bidiagonal` -->
- [x] #55251 <!-- Restrict binary ops for Diagonal and Symmetric to
Number eltypes -->95 <!-- LAPACK: Aggressive constprop to concretely
infer syev!/syevd! -->
- [x] #55522 <!-- Fix tr for Symmetric/Hermitian block matrices -->

Need manual backport:
- [x] #55342 <!-- Ensure bidiagonal setindex! does not read indices in
error message -->

Contains multiple commits, manual intervention needed:

- [ ] #55336 <!-- codegen: take gc roots (and alloca alignment) more
seriously -->


Non-merged PRs with backport label:
- [ ] #55506 <!-- Fix indexing in _mapreducedim for OffsetArrays -->
- [ ] #55500 <!-- make jl_thread_suspend_and_get_state safe -->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55453 <!-- Privatise the annotations API, for StyledStrings -->
- [ ] #55443 <!-- Add test for upper/lower/titlecase and fix call -->
- [ ] #55355 <!-- relocation: account for trailing path separator in
depot paths -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
- [ ] #41244 <!-- Fix shell `cd` error when working dir has been deleted
-->
@KristofferC KristofferC mentioned this pull request Aug 26, 2024
33 tasks
@tecosaur
Copy link
Contributor Author

I'm still hoping for help working this out, if I can interest anyone in taking a look.

KristofferC added a commit that referenced this pull request Sep 17, 2024
Backported PRs:
- [x] #55480 <!-- Fix push! for OffsetVectors, add tests for push! and
append! on AbstractVector -->
- [x] #55443 <!-- Add test for upper/lower/titlecase and fix call -->
- [x] #55524 <!-- Set `.jl` sources as read-only during installation -->
- [x] #55500 <!-- make jl_thread_suspend_and_get_state safe -->
- [x] #55506 <!-- Fix indexing in _mapreducedim for OffsetArrays -->
- [x] #55564 <!-- Empty out loaded_precompiles dict instead of asserting
it's empty. -->
- [x] #55567 <!-- Initialize threadpools correctly during sysimg build
-->
- [x] #55596 <!-- Fast bounds-check for CartesianIndex ranges -->
- [x] #55605 <!-- Reroute Symmetric/Hermitian + Diagonal through
triangular -->
- [x] #55640 <!-- win: move stack_overflow_warning to the backtrace
fiber -->
- [x] #55715 <!-- Add precompile signatures to Markdown to reduce
latency. -->
- [x] #55593 <!-- Fix invalidations for FileIO -->
- [x] #55555 <!-- Revert "Don't expose guard pages to malloc_stack API
consumers" -->
- [x] #55720 <!-- Fix `pkgdir` for extensions -->
- [x] #55729 <!-- Avoid confounding compilation side effects of
`@time_imports` -->
- [x] #55718 <!-- Fix `@time_imports` extension recognition -->
- [x] #55522 <!-- Fix tr for Symmetric/Hermitian block matrices -->

Contains multiple commits, manual intervention needed:
- [ ] #55509 <!-- Fix cong implementation to be properly random and not
just cycling. -->

Non-merged PRs with backport label:
- [ ] #55641 <!-- fall back to slower stat filesize if optimized
filesize fails -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55453 <!-- Privatise the annotations API, for StyledStrings -->
- [ ] #55355 <!-- relocation: account for trailing path separator in
depot paths -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
- [ ] #41244 <!-- Fix shell `cd` error when working dir has been deleted
-->
@KristofferC KristofferC mentioned this pull request Sep 24, 2024
30 tasks
KristofferC added a commit that referenced this pull request Sep 25, 2024
Backported PRs:
- [x] #55773 <!-- Add compat entry for `Base.donotdelete` -->
- [x] #41244 <!-- Fix shell `cd` error when working dir has been deleted
-->
- [x] #55795 <!-- fix #52986, regression in `@doc` of macro without REPL
loaded -->
- [x] #55829 <!-- [Dates] Make test more robust against non-UTC
timezones -->
- [x] #55641 <!-- fall back to slower stat filesize if optimized
filesize fails -->
- [x] #55744 <!-- fix #45494, error in ssa conversion with complex type
decl -->
- [x] #55783 <!-- use `inferencebarrier` instead of `invokelatest` for
1-arg `@assert` -->
- [x] #55739 <!-- Add `invokelatest` barrier to `string(...)` in
`@assert` -->

Need manual backport:
- [ ] #55798 <!-- Broadcast binary ops involving strided triangular -->

Contains multiple commits, manual intervention needed:
- [ ] #55509 <!-- Fix cong implementation to be properly random and not
just cycling. -->
- [ ] #55569 <!-- Add a docs section about loading/precomp/ttfx time
tuning -->
- [ ] #55824 <!-- Replace regex package module checks with actual code
checks -->

Non-merged PRs with backport label:
- [ ] #55845 <!-- privatize annotated string API, take two -->
- [ ] #55828 <!-- Fix some corner cases of `isapprox` with unsigned
integers -->
- [ ] #55813 <!-- Check for conflicting `@ccallable` name before JIT
registration -->
- [ ] #55743 <!-- doc: heap snapshot viewing -->
- [ ] #55741 <!-- Change annotations to use a NamedTuple -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55453 <!-- Privatise the annotations API, for StyledStrings -->
- [ ] #55355 <!-- relocation: account for trailing path separator in
depot paths -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
@tecosaur
Copy link
Contributor Author

If I can interest anyone in giving this a look, I'd very much appreciate help working out how to work through the method ambiguity issues, or whether there's another approach that could be taken.

@KristofferC KristofferC mentioned this pull request Sep 30, 2024
39 tasks
KristofferC added a commit that referenced this pull request Oct 1, 2024
Backported PRs:
- [x] #55849 <!-- Mmap: fix grow! for non file IOs -->
- [x] #55863 <!-- Update TaskLocalRNG docstring according to #49110 -->
- [x] #54433 <!-- Root globals in toplevel exprs -->
- [x] #55828 <!-- Fix some corner cases of `isapprox` with unsigned
integers -->
- [x] #55890 <!-- Profile: fix order of fields in heapsnapshot & improve
formatting -->
- [x] #55884 <!-- inference: add missing `TypeVar` handling for
`instanceof_tfunc` -->
- [x] #55881 <!-- Install terminfo data under /usr/share/julia -->
- [x] #55909 <!-- do not intentionally suppress errors in precompile
script from being reported or failing the result -->
- [x] #55355 <!-- relocation: account for trailing path separator in
depot paths -->
- [x] #55917 <!-- fix rawbigints OOB issues -->
- [x] #55892 <!-- TOML: Avoid type-pirating `Base.TOML.Parser` -->
- [x] #55798 <!-- Broadcast binary ops involving strided triangular -->
- [x] #55919 <!-- Limit `@inbounds` to indexing in the dual-iterator
branch in `copyto_unaliased!` -->

Contains multiple commits, manual intervention needed:
- [ ] #54009 <!-- allow extensions to trigger from packages in [deps]
-->
- [ ] #55509 <!-- Fix cong implementation to be properly random and not
just cycling. -->
- [ ] #55569 <!-- Add a docs section about loading/precomp/ttfx time
tuning -->
- [ ] #55824 <!-- Replace regex package module checks with actual code
checks -->

Non-merged PRs with backport label:
- [ ] #55932 <!-- REPL: make UndefVarError aware of imported modules -->
- [ ] #55910 <!-- Prevent extensions from blocking parallel
pre-compilation -->
- [ ] #55908 <!-- add logic to prefer loading modules that are already
loaded -->
- [ ] #55886 <!-- irrationals: restrict assume effects annotations to
known types -->
- [ ] #55871 <!-- lowering: don't reverse handler order in
`(pop-handler-list ...)` -->
- [ ] #55870 <!-- fix infinite recursion in `promote_type` for
`Irrational` -->
- [ ] #55867 <!-- update `hash` doc string: `widen` not required any
more -->
- [ ] #55851 <!-- [REPL] Fix #55850 by using `safe_realpath` instead of
`abspath` in `projname` -->
- [ ] #55813 <!-- Check for conflicting `@ccallable` name before JIT
registration -->
- [ ] #55743 <!-- doc: heap snapshot viewing -->
- [ ] #55741 <!-- Change annotations to use a NamedTuple -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
@KristofferC KristofferC mentioned this pull request Oct 7, 2024
44 tasks
@KristofferC KristofferC mentioned this pull request Oct 18, 2024
43 tasks
@@ -125,7 +126,10 @@ AnnotatedString(s::AnnotatedString, annots::Vector{Tuple{UnitRange{Int}, Pair{Sy
AnnotatedChar(c::AnnotatedChar, annots::Vector{Pair{Symbol, Any}}) =
AnnotatedChar(c.char, vcat(c.annotations, annots))

String(s::AnnotatedString{String}) = s.string # To avoid pointless overhead
# To allow for generically de-annotating a string.
(::Type{T})(s::AnnotatedString{T}) where {T <: AbstractString} = T(s.string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is genuinely ambiguous with respect to at least these methods, possibly more:

GenericString(string::AbstractString) @ Base strings/annotated.jl:130
SubString(s::AbstractString) @ Base strings/substring.jl:59
Core.Compiler.LazyString(args...) @ Core.Compiler strings/lazy.jl:41

I am not sure it is possible to resolve this (esp. if one also takes the potential for similar such issues with packages into account). Not am I sure it is necessary: sure instead of MyStringType(my_annotate_str) it is also OK if I have to write MyStringType(String(my_annotate_str)) ? The necessary method is already in master

@KristofferC KristofferC mentioned this pull request Nov 22, 2024
34 tasks
@KristofferC KristofferC mentioned this pull request Dec 3, 2024
43 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.11 Change should be backported to release-1.11 strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AnnotatedString has no public way to extract the underlying string without annotations
3 participants