-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add parse(Type, str)
method for parsing Types: DataTypes, Unions and UnionAlls.
#52668
base: master
Are you sure you want to change the base?
Conversation
bfcfdd6
to
e377700
Compare
This is awesome! |
@jakobnissen this is about parsing type objects themselves from strings. So in your example, it would be an error, since you've provided a number, not a type. You could do Does that make sense? EDIT: Could I adjust anything in the PR description to make this clearer? |
No this is clear enough, it was just me who didn't read it carefully enough :) |
15100e9
to
7bab22d
Compare
base/parse-type.jl
Outdated
end | ||
if ast isa Expr && ast.head === :incomplete | ||
throw(ArgumentError("invalid type expression: $str")) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these checks for a valid ast
go into _try_parse_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, no: I left this out of that since _try_parse_type
is called recursively, but this only needs to be checked once. Lemme add a comment to explain that.
v === nothing && return nothing | ||
v | ||
end) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this accomplish compared to just return v
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because I don't want to return if v
is not nothing; i want to continue. I should have called this @_bail_if_nothing
, since if the value is nothing, it will return early, rather than continuing the function.
I'll rename it, and then can you let me know if this seems acceptable?
b37b965
to
0efaf82
Compare
You might be interested in (for reference if nothing else) the approach I've taken in DataToolkitBase where I have a It is defined like so: struct QualifiedType
root::Symbol
parents::Vector{Symbol}
name::Symbol
parameters::Tuple
end And here is the parsing code (less capable than yours I suspect) https://github.com/tecosaur/DataToolkitBase.jl/blob/main/src/model/parser.jl#L5-L59, and the "make it an actual type" code This is how performance compares: julia> @btime eval(Meta.parse("Main.Main.Base.Checked.Vector{Dict{Int, Base.Float64}}"))
99.360 μs (176 allocations: 10.11 KiB)
Vector{Dict{Int64, Float64}} (alias for Array{Dict{Int64, Float64}, 1})
julia> @btime typeify(parse(QualifiedType, "Main.Main.Base.Checked.Vector{Dict{Int, Base.Float64}}"))
11.224 μs (190 allocations: 10.59 KiB)
Vector{Dict{Int64, Float64}} (alias for Array{Dict{Int64, Float64}, 1}) |
Parse all types, including DataTypes, UnionAlls and Unions. - Support function types: `typeof(+)` - Support constant isbits type constructors in parse type - Support `Vector{<:Number}` - Fix `Vector{S} where {Int<:T<:Number, S<:T}` - Add a bunch more automated test cases for better coverage - Add tryparse - Add support for names qualified by non-imported top-level packages. Mark tests for Union{} and Tuple{} broken - Add test case for new loaded_modules support
aa19464
to
c35ad1f
Compare
Okay, I think this is ready for a real review! I would super appreciate another pass, @stevengj and @MasonProtter, and maybe a review from @topolarity, @c42f, or @LilithHafner? The remaining open questions are I think:
Thanks!! CC: @kpamnany |
julia> Base.tryparse(Int, "5+4")
julia> Base.tryparse(Type, "5+4")
ERROR: MethodError: no method matching reinterpret(::typeof(+), ::Tuple{Int64, Int64})
The function `reinterpret` exists, but no method is defined for this combination of argument types.
... |
@_bail_if_nothing ast.args[i] = _try_parse_type(module_context, ast.args[i], raise, type_vars) | ||
end | ||
# We use reinterpret to avoid evaluating code, which may have side effects. | ||
return reinterpret(typ, Tuple(ast.args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes the default constructor. It can be wrong, though
julia> struct Point
x::Int32
y::Int32
Point(x, y=x) = new(x, y)
end
julia> str = "Val{Point(-7)}"
"Val{Point(-7)}"
julia> eval(Meta.parse(str))
Val{Point(-7, -7)}
julia> parse(Type, str)
Val{Point(-7, -1)}
The task of constructing arbitrary values at runtime without evaluating code is tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, interesting... 🤔 ..... so maybe this whole idea is not valid? :(
Maybe we could get away with it by dropping values, and documenting a restriction that we don't support struct values in the type domain, and for those, we recommend falling back to eval
?
:( Sadbeans! Thanks for catching this. Do you have any other good ideas / suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't get this right, maybe this ultimately doesn't belong in base, and it should be a library similar to what @tecosaur shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also also, i was mainly interested in using this for improving PackageCompiler, and it's worth noting that structs in types are currently broken in precompile statements anyway, since they're printed as kwargs for some reason!?:
precompile(Tuple{Type{Base.Val{Main.Point(x=-7, y=-7)}}})
julia> str = "precompile(Tuple{Type{Base.Val{Main.Point(x=-7, y=-7)}}})"
"precompile(Tuple{Type{Base.Val{Main.Point(x=-7, y=-7)}}})"
julia> eval(Meta.parse(str))
ERROR: MethodError: no method matching Point(; x::Int64, y::Int64)
So these cases are all broken anyway, so maybe we don't need to support them? (though i'd like to fix the way they're printed in precompiles, too...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a parse
method, I think it's reasonable to only support values in type parameters if the values are embedded directly in the syntax tree (e.g. Val{5}
or Val{"a"}
but not Val{1+1}
or Val{(1,1)}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK if that will be enough for your use-case, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree I wouldn't support Val{1+1}
, since that is definitely not a value embedded directly in the tree.
But isn't Val{(1,1)}
? I do think we see constant tuples in our types fairly frequently, so i would want that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe special case to allow this iff the default inner constructor is accessible and the parsed string would dispatch to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, these print with kwarg-like syntax because it would be wrong to run the constructor to recreate them, so that is intended to dissuade the user from attempting to construct the object by doing so (as well as being a bit more more readable). Doing a ccall to the generic construction method (as done in Serialization) is probably the best approach
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
The biggest thing that stands out to me is that this feels to me like a deserialization and not a parse. In particular, it has to worry about a lot of classic "serialization" problems:
These questions make the meaning of a type string like this contextual, even for very "primitive" seeming type names: module Test
struct Union{A,B} end
struct Float32 end
struct Float64 end
struct Foo
x::Union{Float32,Float64}
end
end The Julia syntax here works double-duty as a serialization format (very similar to how raw |
@topolarity: I agree with you, but for better or worse, julia is already using The current process is that PackageCompiler runs your snoop workload with precompile(Tuple{typeof(Base.iterate), Base.Set{Char}})
precompile(Tuple{typeof(Base.copy), Array{ArgParse.ArgParseGroup, 1}})
precompile(Tuple{typeof(Base.pairs), NamedTuple{(:help, :arg_type), Tuple{String, DataType}}})
precompile(Tuple{typeof(Unrolled.type_length), RAICode.QueryEvaluator.var"#169#170"{String, ((), (1,))}}) then PackageCompiler parses and evals each of those types, and precompiles them, as in something like this: ps = Meta.parse(statement)
@assert isexpr(ps, :call) && ps.args[1] == :precompile
ps = Core.eval(PrecompileStagingArea, ps)
precompile(ps...) (We want to do this in our production servers, by having them run a set of precompile statements at startup, but I want to prevent the risk of someone having an arbitrary code-execution exploit by mutating the file that contains those statement - and also to make it faster.) So I'm open to picking a different name for this mechanism, but i think the use of strings as a round-tripping format for types is already out there, and I'm mainly hoping to make that safer and faster. It does feel like parsing to me though: you're taking a string representation of a value, and constructing that julia value, according to julia's parse-tree. |
Yeah, I agree with the motivation! Better safety/performance for My complaint was specifically about tying this to On the other hand, doing I'd prefer this had a similar API that is either very clear about doing contextual evaluation (albeit much safer and more limited than |
Thanks cody. I'll have to think about what an alternative API should look like. --- Do you have any suggestions? (FWIW, I did add a But yeah, i think your greater point stands that this probably isn't exactly "parsing." It's really close - it is at least parsing-adjacent. But indeed, maybe type-deserializing is a better term? ..... 😅 I keep coming back to this question in my head though: is deserializing a value from a string representation of julia source code anything other than parsing? 😅 |
I'm late to the party here (as usual sorry 😬) But I totally agree with Cody. @topolarity I appreciate how well you explained why this "doesn't quite feel right as parse". I agree. It seems that what you've implemented here is actually a restricted Julia interpreter:
All this makes it surprisingly complex! Amusingly we already have other interpreters a bit like this. For example, the runtime's Should we really keep a zoo of special purpose interpreters? I don't know. If it matters performance-wise I guess we have to.
This seems like a lot of work, I don't recommend it 😅 |
Description
Add a method to
parse
:parse(::Type{<:Type}, ::AbstractString)
, which parses a string into a julia type object.This replaces the only alternative previously available, which was to
eval(Meta.parse(str))
.This is beneficial for two major reasons:
eval()
, which is a very general mechanism.s = "precompile(Tuple{typeof(+), dump_all_your_secrets()})"
,eval(Meta.parse(s))
would invoke unwanted code.Some microbenchmark results:
Notes:
Is this really parsing?
This capability has been discussed / requested by members of the community in the past. The answer has always been to parse+eval. For example in this thread, the OP was told to eval the string: https://discourse.julialang.org/t/parse-string-to-datatype/7118/8.
In that thread, the argument is made that this isn't "parsing," since Type objects are not literals from the parser's perspective. This really is more like parse+eval. In which case, I'm happy to use a different name for this if people feel strongly about it. But to me, it feels like parsing, so
parse(Type, str)
does feel like the right API for this.I'm open to suggestions on that!
Remaining Work
I first wrote this with only
DataType
s in mind. You can see the version that parses only DataTypes in the first commit, 28ee0f3. But then I realized that we should also support UnionAlls, which requires a bit more work. I'm still working through how best to do that... I could probably use some help with this, TBH.The current implementation is still not correct for all UnionAll types:EDIT: I think it's correct now!Vector{<:Number}
Main.Vector{S} where {Int<:T<:Number, S<:T}
typeof(+)
Array{<:Any,2}
,Val{X(Y(2))}
?reinterpret
? That seems safe.tryparse
, and use that, to allow parse failures without throwing an exception.Meta.parse
and then walking the parsed AST?Meta.parse
's implementation though?CC: @c42f - this seems very up your alley 😊 Happy new year! 🌟