-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Nullable Redesign #21
Conversation
This approach is essentially the inverse of the `Union{T, Void}`: here no | ||
methods will work on objects of this type unless they are defined for both | ||
`f(x::Some{T})` and `f(x::Void)`. As such, this approach requires consistent | ||
of specialized lifting machinery via dot-broadcasting syntax. See the later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"consistent of"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> "consistent usage of"
The representation of null-like constructs in Julia is too complicated. We | ||
currently have all of the following kinds of null-like constructs: | ||
|
||
* A ` Void` type with a singleton value `nothing`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space after the backtick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
## Step 2:: Define Core Operations on Nullable Types by Lifting | ||
|
||
The core semantic issue for nullable types is maximizing code reuse: we want | ||
to evaluate `f(x::?T)`, but not require that every function be defined for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention what ?T
is intended to mean—it's used here mid-discussion with no prior mention of the syntax
* `hasvalue` | ||
* `isnull` | ||
* `get` | ||
* `get_or_default` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a function in Base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for C# comparisons. This is our two argument get
. Will indicate that.
`T?`. Which is chosen will likely depend upon how they interact with the | ||
ternary operator. | ||
|
||
We might also introduce a null-coalescing operator of `??` with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentence flows better without "of" IMO
First, we define `broadcast(f, x::Nullable)` as follows: | ||
|
||
``` | ||
immutable Lifted{T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could specify T<:Function
, since type constructors can't lift (i.e. Lifted{Int}
whaaat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need type constructors to lift if you want things like pdf(Binomial(n, 0.5), x)
to work correctly when evaluated against a stream of tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right, I didn't think of that.
The `Union` type would have the same semantics as unions currently have. Their | ||
implementation in the scalar case would likely match that of a possible | ||
`TaggedUnion` type, which represents a discriminated union, but is a concrete | ||
type that is not a parent of `T`, `Void`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain what would be the advantage of it not being a parent of these types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove this section as I think it adds complexity without many gains.
|
||
The counter-argument is that such code will only work if the methods being | ||
called are defined for both `T` and `Void`. As such, one must redefine | ||
`f(x::Void) = nothing` for every new function, even though the semantics are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could mention the possibility of having a generic fallback definition like this for any function f
(even if this might be undesirable and Jeff wouldn't support it).
This approach is essentially the inverse of the `Union{T, Void}`: here no | ||
methods will work on objects of this type unless they are defined for both | ||
`f(x::Some{T})` and `f(x::Void)`. As such, this approach requires consistent | ||
of specialized lifting machinery via dot-broadcasting syntax. See the later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't mention dot-broadcasting syntax here: that's one of the possibilities for lifting, but that choice is kind of orthogonal to the issues tackled in this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
section on lifting for details. | ||
|
||
Because users will have to work with lifted functions, this approach is | ||
less convenient, but more general and often safer. It also allows one to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense is it more general?
distinguish between outcomes in cases where a method returns something like | ||
`Union{Some{Void}, Void}` -- as might happen in a hash table implementation | ||
that returns `Some` when a key is present and associated with `nothing` as a | ||
value, but returns `Void` when a key is not present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some
-> Some(nothing)
Void
-> nothing
There are natural extensions of this logic to the n-ary function case. | ||
|
||
Because we would use broadcasting, lifting would happen via dot-syntax: | ||
`x .+ y` would be lifted by default. In addition, we would special several |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special -> specialize
Because we would use broadcasting, lifting would happen via dot-syntax: | ||
`x .+ y` would be lifted by default. In addition, we would special several | ||
operators like `+`, `-`, `==`, etc. to automatically be equivalent to | ||
broadcasting on nullables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in C# (for the first two at least).
Following the lead of | ||
[Flow for Javascript](https://flowtype.org/docs/nullable-types.html), we | ||
would introduce `?T` as sugar for `Nullable{T}`. We have also considered C#'s | ||
`T?`. Which is chosen will likely depend upon how they interact with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also used by Swift.
|
||
We might also introduce a null-coalescing operator of `??` with | ||
right-associativity that essentially performs lifting: in this case, | ||
`x ?? y ?? 0` would be equivalent to `hasvalue(x) ? x : (hasvalue(y) ? y : 0)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasvalue
-> !isnull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or isnull
and switch the order
|
||
Some such functions are: | ||
|
||
* `hasvalue` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't exist in Base either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up, @johnmyleswhite.
I realise this is WIP, but in general I'm left a bit confused whether you want Nullable
to be a one-element container similar to RefValue{Union{T,Void}}
or whether you literally mean typalias Nullable{T} Union{T,Void}
and to use Void
like NA
was used in DataArrays
, except that the compiler will be faster.
Because in the latter case, you don't want to use broadcast and higher-order lifiting, but instead you need method specializations (because methods can only ever receive concrete types).
That problem is essentially the problem we already have with `Nullable{T}`: | ||
writing "clean" functions requires information about type-inference because of | ||
a coupling of result types to knowledge about counterfactual outcomes in | ||
branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that this applies? Wouldn't you use just use dispatch, and define e.g. &(::Bool, Null{Bool})
?
For containers of Union{T,Null{T}}
, functions like map
and broadcast
will have this reliance on inference just as containers of other types always do currently (it won't matter at all that the element type is Union{T,Null{T}}
as opposed to anything else).
### `Union{T, Null{T}}` Pros & Cons | ||
|
||
This approach allows us to distinguish `Null{Bool}` from `Null{Int}` in | ||
cases in which that distinction might be useful. R makes such a distinction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this approach is reminiscent of NA
from DataArrays
, but much more precise, in that you can have true & Null{String}
give an error, say, while true & Null{Bool}
can return Null{Bool}
functions will return `null` as their output whenever any of their arguments | ||
are `null`. Implementing this lifting of functions can be solved as follows: | ||
|
||
First, we define `broadcast(f, x::Nullable)` as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, wait, I'm lost here. If we literally replace Nullable{T}
with Union{T, Void}
or Union{T, Null{T}}
then dispatch will handle function calls. broadcast
, like every other function, would receive a concrete type like T
or Void
or Null{T}
.
Further, if you mean typealias Nullable{T} Union{T, Void}
, then we would also have T <: Nullable
for all possible types T
.
Like DataArrays
and NA
, lifting is not necessary. You do need a bunch more method definitions for Void
/ Null{T}
/ Some{T}
.
I'm taking this as requests for feedback/opinions (apologies if I misunderstood). I would advocate strongly for I also feel that I see you also mentioned There is one thing that remains to confuse me. Currently we can have |
Thanks for the comments, Andy! I should point out that the On that note, I think you're misunderstanding the way If we use Will address your other comments tomorrow morning. |
Thanks @johnmyleswhite - that clarifies things immensely. So we would have something which roughly has this structure: immutable Nullable{T}
value::Union{T,Void}
end
size(::Nullable) = ()
getindex(n::Nullable) = n.value
eltype{T}(::Nullable{T}) = Union{T,Void} # ??
isnull(n::Nullable) = isa(n.value, Void) and then use |
Of course this: immutable Nullable{T}
value::Union{T,Null{T}}
end offers few benefits, and if that is what you meant, then I agree entirely. To be clear, I was suggesting to use "bare" nullables with no wrapper type, to replace |
Revised the document. Still a bunch more to do (and a few comments left to respond to), but I'm now almost completely convinced we'd want to make |
lifting. | ||
|
||
The unique concern with `Union{T, Void}` is `f(x::T)` will often be defined | ||
because it is exists even in the parts of Julia that deal with only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it is exists"
that can take in a function with a method of the form `f(x::T)` and employ a | ||
default semantics for null values to handle the `f(x::Void)` case. In | ||
particular, we will refer to the pattern of extending an `f(x::T)` method to | ||
handle `f(x::Union{T, Void)` by assuming that `f(x::Void) = nothing` as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a }
for x in xs | ||
s += x | ||
end | ||
return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return s
?
current `Nullable{T}` implementation as one must determine the type parameter | ||
`T` even for the null case. | ||
|
||
## Step 2:: Define Core Operations on Nullable Types by Lifting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2::
-> 2:
?
`x ?? y ?? 0` would be equivalent to `hasvalue(x) ? x : (hasvalue(y) ? y : 0)`. | ||
|
||
Finally, we might introduce `.?` for lifted field access such that | ||
`x?.field_name` would be equivalent to `getfield.(x, :field_name)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You first mention .?
but then use x?.
. Which one is being considered? Or are both?
following occurs: | ||
|
||
``` | ||
julia> Array(String, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That syntax is "deprecated" per the manual (though no warning or anything is emitted). In the now-preferred syntax, it would be Array{String,1}(1)
.
|
||
Deciding between these three options will require balancing a variety of | ||
concerns. The arguments for each option are summarized below. We include all | ||
of them for clarity, although we believe that the arguments on behalf of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in favor of" rather?
handle `f(x::Union{T, Void)` by assuming that `f(x::Void) = nothing` as | ||
lifting. | ||
|
||
The unique concern with `Union{T, Void}` is `f(x::T)` will often be defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say it's the unique concern. The strongest concern for me is that this nullable type wouldn't allow distinguishing between no value and a stored null value when indexing a dictionary.
functions in Base Julia were repeated again when adding methods for `NAtype`: | ||
instead of describing a universal pattern using a generic abstraction, we | ||
repeated that pattern manually over and over again, while always discovering | ||
new cases that had been left out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem of manual redefinitions is not a criticism of the Union{T, Void}
approach per se: it would rather justify hard-coding something like (pseudo-code) (::F){F<:Function}(args...) = nothing
for all functions. (The next third paragraph below gives reasons why this would not be a good idea, but it's a different argument IMHO.)
|
||
``` | ||
function sum(xs::Vector{Union{Some{Int}, Void}}) | ||
s = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Some(0)
?
end | ||
``` | ||
|
||
What type does this return when `xs = ?Int[]`? What type does it return when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you illustrate the alternative solutions? It's not completely clear to me what problem you're describing.
the implementation details of the language. But Julia does not provide such | ||
guarantees, so type-inference must be invoked to determine information about | ||
the return type of the counterfactual branch. An alternative approach is to | ||
use `promote_type` to explicitly register return types, but this is clearly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
promote_op
? Anyway, I don't think it's worth mentioning as it's clearly not a solution.
implementation that uses distinct sentinel values for each nullable type in | ||
the language. Otherwise, this approach is similar to the `Union{T, Void}` case | ||
described above, except for one crucial problem that would seem to suggest this | ||
approach offers few benefits over our current implementation of `Nullable{T}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention possible benefits which justify detailing this approach at all. One benefit I can see is preventing things like Null{String}() + 1
, which would most likely occur due to a bug somewhere.
What benefits we expect also matters for discussing the problems with this approach. In particular, when inference fails, we could fall back to Null{Any}
or Null{Union{}}
. Depending on what that type information is used for, it could be a problem or not.
Let me add something to ensure that all potential questions have answers. One difficulty with that approach is distinguishing when an
is it clear whether this should be |
Unrelated to the current matter of discussion; but one thing we should do in 0.6 if at all still possible, is deprecate using the ternary operator without spaces. Otherwise things like
will parse in an unexpected way. |
There are also other functions which we must consider
And then of course a list of functions where the lifting behaviour is ambiguous
I am sure this list is incomplete; but it must be something to consider. |
The more I think about it, the more I think we shouldn't lift by default; cases where |
There seems to be some consensus recently of |
I don't think it matters too much how we spell it so long as the behavior is well defined and well documented. |
I also think "universal" default definitions (lifting) for I still believe that However, to get this semantic of a missing value of type If we don't get this behavior then a signature like EDIT: What I said here about covariance of |
@andyferris I don't see how
That's why you define proper promotions in Base and locally define |
Sorry, some functions are overloaded with lots and lots of methods on purpose, e.g. try &(x::Bool, ::Null) = x ? Null() : false
&(::Null, y::Bool) = y ? Null() : false
&(::Null, ::Null) = Null() and another user creates a type
then they will not ever be able to stop I would prefer we didn't add things which constrain multiple dispatch and user's ability to overload methods, whenever they decide they also need to use missing values. In the example above, the user would need to define another I do agree that for typical, database stype operations, you're not going to run into this esoteric set of conditions. But the fact is that we can foresee circumstances where using the type and dispatch system more strongly will lead to bad interactions with |
In short terms, if a user defines |
That doesn't seem like a problem to me; that's just kind of how I expect |
Of course, this is exactly how Do note that the covariance idea means that 99.9% of the time you can just use f(x::?T1) = if isnull(x) ? #= do a =# : #= do b =# # note that branch is elided by compiler!
f(x::?T2) = if isnull(x) ? #= do c =# : #= do d =#
f(Null) # ambiguous That basically means it's never safe to dispatch on
I'm arguing that there is (or that there might be) a benefit for EDIT: please replace "covariance idea" with "using |
Or to answer your question another way, the only problem I saw with What are the problems that |
The relevant problem with |
Actually it's only parametric containers that rely on inference. You can for instance define You could use |
I give up. I'm done having these debates when no one involved is paying me for my time. |
John - please don't take me the wrong way. We all really appreciate the effort that is going into this - I think it is really fantastic! (I'm only trying to bounce some ideas around and have a technical discussion about some things which I thought might have been overlooked) |
@johnmyleswhite I too greatly appreciate the effort that has gone into this Julep. Thanks for all your hard work; it's truly important for building a strong foundation for 1.0. @andyferris There is a flaw, in my opinion, with the covariant Consider for example abstract type Symbolic <: Real end
struct SymInteger <: Symbolic; val::BigInt; end
struct Reciprocal <: Symbolic; of::Symbolic; end and imagine that, conventionally, |
@TotalVerb I think you might be right (I think there was a ? in my post next to the "concrete"). Can we have consistent covariance ( |
Hmm, that would be unusual compared to anything else in the language (it would be a type which has an instance, but also has subtypes), but I suppose there is nothing preventing its implementation. Moving on to practical issues though, as mentioned in the Julep, consider function map(f, x::Nullable{T})
if !isnull(x)
return Nullable(f(get(x)))
else
return Nullable{Core.Inference.return_type(f, (T, ))}()
end
end If we rewrite this with map(f, x::Any) = f(x)
map{T}(f, x::Null{T}) = Null{Core.Inference.return_type(f, Tuple{T})}() which not only has an inference dependency, but also has a strange definition of map. It seems like we would be giving up on |
An instructive exercise, perhaps, is to find some concrete examples of code that uses Let me get the ball rolling with some parts of Base Julia that use
I suspect that a lot of these use cases will be relatively independent of which interpretation is chosen, but we should also consider package code that makes use of nullable. |
Actually that does seem like a really big departure from the current type system.
Do people need If |
I think you're right. However, this goes back to the original issue of when functions should be lifted over null arguments. If I want to compute lift(f, x) = f(x)
lift(f, x::Null{T}) = Null{C.I.return_type(f, Tuple{T})}() However, because there is no more |
Thank you very much @TotalVerb for helping me figure out the trade-offs here. It seems true that the My general approach here has been to see if sufficiently clever dispatch would mean lifting is not necessary - and demanding that users/package authors provide the necessary method. I will concede that this might not be very user-friendly, especially in the case that the methods are coming from somebody else's package. In any case, I would suggest providing a range of common methods for Anyway, it seems that |
By the way, everything I was saying about the covariance of Thus I would suggest having |
@andyferris This approach has already been tried with vectorized methods, and it was considered as a design flaw. In 0.6 we've been able to move away from it by using the Also the issue with inference is a deep one. Even if we get rid of
Yes, and that shows the only solution in that case is to have a common definition (The example of 3VL Finally, I'm not sure we need so many functions not to be lifted. Functions for which passing a nullable makes no sense should generally not be lifted, and people shouldn't pass nullables to them. Why would one call Anyway, we could start without a default lifting fallback and see how it goes. At least, |
Thank you so much @nalimilan for your wonderful and very detailed response. I have certainly learned a lot today from this discussion. I had been interested in the development of In any case, I feel that having this discussion has been valuable exercise. This certainly is a tricky design problem, and there are clearly rather few available options which satisfy e.g. determinism w.r.t. inference and user-friendliness. @johnmyleswhite I'd like to apologize to you directly - as I alluded to earlier, I have been very impressed with the leadership shown with this PR and, in fact, have been looking forward to seeing this PR merge. It was never my intention to derail this process. However, I think it's reasonable to say this Julep has become a focus point for the future of |
I think we should merge this PR and add the missing details later. |
This draft describes the core ideas behind a redesign of nullables. It will require substantial elaboration before we have a final design, but my goal is to get this out in front of people while we are still debating the remaining points of uncertainty in the design. The primary source of remaining uncertainty are the arguments in favor of
Union{T, Void}
vsUnion{Some{T}, Void}
.