[WIP] Nullable Redesign #21

johnmyleswhite · 2016-12-10T00:14:24Z

This draft describes the core ideas behind a redesign of nullables. It will require substantial elaboration before we have a final design, but my goal is to get this out in front of people while we are still debating the remaining points of uncertainty in the design. The primary source of remaining uncertainty are the arguments in favor of Union{T, Void} vs Union{Some{T}, Void}.

quinnj · 2016-12-10T03:30:01Z

Nullable.md

+This approach is essentially the inverse of the `Union{T, Void}`: here no
+methods will work on objects of this type unless they are defined for both
+`f(x::Some{T})` and `f(x::Void)`. As such, this approach requires consistent
+of specialized lifting machinery via dot-broadcasting syntax. See the later


"consistent of"?

-> "consistent usage of"

ararslan · 2016-12-10T03:54:04Z

Nullable.md

+The representation of null-like constructs in Julia is too complicated. We
+currently have all of the following kinds of null-like constructs:
+
+* A ` Void` type with a singleton value `nothing`,


Extra space after the backtick

ararslan · 2016-12-10T04:07:44Z

Nullable.md

+## Step 2:: Define Core Operations on Nullable Types by Lifting
+
+The core semantic issue for nullable types is maximizing code reuse: we want
+to evaluate `f(x::?T)`, but not require that every function be defined for


Should mention what ?T is intended to mean—it's used here mid-discussion with no prior mention of the syntax

ararslan · 2016-12-10T04:12:25Z

Nullable.md

+* `hasvalue`
+* `isnull`
+* `get`
+* `get_or_default`


I don't think this is a function in Base?

I did this for C# comparisons. This is our two argument get. Will indicate that.

ararslan · 2016-12-10T04:14:10Z

Nullable.md

+`T?`. Which is chosen will likely depend upon how they interact with the
+ternary operator.
+
+We might also introduce a null-coalescing operator of `??` with


Sentence flows better without "of" IMO

ararslan · 2016-12-10T04:21:59Z

Nullable.md

+First, we define `broadcast(f, x::Nullable)` as follows:
+
+```
+immutable Lifted{T}


Could specify T<:Function, since type constructors can't lift (i.e. Lifted{Int} whaaat)

I think we need type constructors to lift if you want things like pdf(Binomial(n, 0.5), x) to work correctly when evaluated against a stream of tuples.

Oh you're right, I didn't think of that.

nalimilan · 2016-12-10T11:39:24Z

Nullable.md

+The `Union` type would have the same semantics as unions currently have. Their
+implementation in the scalar case would likely match that of a possible
+`TaggedUnion` type, which represents a discriminated union, but is a concrete
+type that is not a parent of `T`, `Void`, etc.


Maybe explain what would be the advantage of it not being a parent of these types?

I'm going to remove this section as I think it adds complexity without many gains.

nalimilan · 2016-12-10T11:41:09Z

Nullable.md

+
+The counter-argument is that such code will only work if the methods being
+called are defined for both `T` and `Void`. As such, one must redefine
+`f(x::Void) = nothing` for every new function, even though the semantics are


Could mention the possibility of having a generic fallback definition like this for any function f (even if this might be undesirable and Jeff wouldn't support it).

nalimilan · 2016-12-10T11:42:28Z

Nullable.md

+This approach is essentially the inverse of the `Union{T, Void}`: here no
+methods will work on objects of this type unless they are defined for both
+`f(x::Some{T})` and `f(x::Void)`. As such, this approach requires consistent
+of specialized lifting machinery via dot-broadcasting syntax. See the later


Probably shouldn't mention dot-broadcasting syntax here: that's one of the possibilities for lifting, but that choice is kind of orthogonal to the issues tackled in this section.

nalimilan · 2016-12-10T11:43:12Z

Nullable.md

+section on lifting for details.
+
+Because users will have to work with lifted functions, this approach is
+less convenient, but more general and often safer. It also allows one to


In what sense is it more general?

nalimilan · 2016-12-10T11:44:30Z

Nullable.md

+distinguish between outcomes in cases where a method returns something like
+`Union{Some{Void}, Void}` -- as might happen in a hash table implementation
+that returns `Some` when a key is present and associated with `nothing` as a
+value, but returns `Void` when a key is not present.


Some -> Some(nothing)
Void -> nothing

nalimilan · 2016-12-10T12:10:18Z

Nullable.md

+There are natural extensions of this logic to the n-ary function case.
+
+Because we would use broadcasting, lifting would happen via dot-syntax:
+`x .+ y` would be lifted by default. In addition, we would special several


special -> specialize

nalimilan · 2016-12-10T12:10:44Z

Nullable.md

+Because we would use broadcasting, lifting would happen via dot-syntax:
+`x .+ y` would be lifted by default. In addition, we would special several
+operators like `+`, `-`, `==`, etc. to automatically be equivalent to
+broadcasting on nullables.


Like in C# (for the first two at least).

nalimilan · 2016-12-10T12:13:14Z

Nullable.md

+Following the lead of
+[Flow for Javascript](https://flowtype.org/docs/nullable-types.html), we
+would introduce `?T` as sugar for `Nullable{T}`. We have also considered C#'s
+`T?`. Which is chosen will likely depend upon how they interact with the


Also used by Swift.

nalimilan · 2016-12-10T12:13:49Z

Nullable.md

+
+We might also introduce a null-coalescing operator of `??` with
+right-associativity that essentially performs lifting: in this case,
+`x ?? y ?? 0` would be equivalent to `hasvalue(x) ? x : (hasvalue(y) ? y : 0)`.


hasvalue -> !isnull

Or isnull and switch the order

nalimilan · 2016-12-10T12:14:18Z

Nullable.md

+
+Some such functions are:
+
+* `hasvalue`


This one doesn't exist in Base either.

andyferris

Thanks for writing this up, @johnmyleswhite.

I realise this is WIP, but in general I'm left a bit confused whether you want Nullable to be a one-element container similar to RefValue{Union{T,Void}} or whether you literally mean typalias Nullable{T} Union{T,Void} and to use Void like NA was used in DataArrays, except that the compiler will be faster.

Because in the latter case, you don't want to use broadcast and higher-order lifiting, but instead you need method specializations (because methods can only ever receive concrete types).

andyferris · 2016-12-14T05:10:21Z

Nullable.md

+That problem is essentially the problem we already have with `Nullable{T}`:
+writing "clean" functions requires information about type-inference because of
+a coupling of result types to knowledge about counterfactual outcomes in
+branches.


Is it true that this applies? Wouldn't you use just use dispatch, and define e.g. &(::Bool, Null{Bool})?

For containers of Union{T,Null{T}}, functions like map and broadcast will have this reliance on inference just as containers of other types always do currently (it won't matter at all that the element type is Union{T,Null{T}} as opposed to anything else).

andyferris · 2016-12-14T05:12:06Z

Nullable.md

+### `Union{T, Null{T}}` Pros & Cons
+
+This approach allows us to distinguish `Null{Bool}` from `Null{Int}` in
+cases in which that distinction might be useful. R makes such a distinction,


I feel this approach is reminiscent of NA from DataArrays, but much more precise, in that you can have true & Null{String} give an error, say, while true & Null{Bool} can return Null{Bool}

andyferris · 2016-12-14T05:20:09Z

Nullable.md

+functions will return `null` as their output whenever any of their arguments
+are `null`. Implementing this lifting of functions can be solved as follows:
+
+First, we define `broadcast(f, x::Nullable)` as follows:


Wait, wait, I'm lost here. If we literally replace Nullable{T} with Union{T, Void} or Union{T, Null{T}} then dispatch will handle function calls. broadcast, like every other function, would receive a concrete type like T or Void or Null{T}.

Further, if you mean typealias Nullable{T} Union{T, Void}, then we would also have T <: Nullable for all possible types T.

Like DataArrays and NA, lifting is not necessary. You do need a bunch more method definitions for Void / Null{T} / Some{T}.

andyferris · 2016-12-14T06:05:42Z

my goal is to get this out in front of people while we are still debating the remaining points of uncertainty in the design. The primary source of remaining uncertainty are the arguments in favor of Union{T, Void} vs Union{Some{T}, Void}.

I'm taking this as requests for feedback/opinions (apologies if I misunderstood).

I would advocate strongly for Union{T, Null{T}} because I feel it would help enormously with writing generic methods. If my method receives a Null{Vector{Int}} at least I know what's the element type of the vector that I was supposed to get, and that might help me to arrive at the correct type of my output. Having Void here doesn't help me predict what the user wants me to do, and it doesn't help me to write different methods of a function that might have slightly different semantics depending on input types.

I also feel that Void and nothing already have a meaning, in terms they are the return value of the thing the compiler doesn't have a return value for - like the result of a for loop, or an empty function body, or an if false statement with no else branch. Having to diagnose whether an unfamiliar function returns nothing on purpose or just because it just happened to be a "null" value for my chosen input would add unnecessary complication.

I see you also mentioned #undef. How would this work? Ideally, I guess we could make the GC aware of how Union works. This could be pretty compelling... but we probably need to solve the below:

There is one thing that remains to confuse me. Currently we can have Nullable{Any}. However Union{Any, Void} = Any (and so-on). I don't think we want to change these properties. So what, for example, would a Nullable{Any}() or an #undef of Any become?

johnmyleswhite · 2016-12-14T06:26:02Z

Thanks for the comments, Andy!

I should point out that the Union{T, Null{T}} implementation is not really seriously being considered as far as I understand the growing consensus. Instead, that implementation is being documented to demonstrate why it offers few benefits over our current implementation.

On that note, I think you're misunderstanding the way Union{T, Void} would work. That's really helpful for me to see because I'll try harder to ensure that revisions of this Julep communicate this point more clearly.

If we use Union{T, Void}, we'll be abandoning the zero-or-one element container way of thinking of Nullable{T} and instead we'd be using what is essentially a tagged union that always contains exactly one value: either a value of type T or a value of type Void. We can think of this as a container to make the use of broadcast more reasonable, but we're mostly taking about broadcast because it provides minimal syntax for lifting.

Will address your other comments tomorrow morning.

andyferris · 2016-12-14T06:34:56Z

Thanks @johnmyleswhite - that clarifies things immensely. So we would have something which roughly has this structure:

immutable Nullable{T}
    value::Union{T,Void}
end
size(::Nullable) = ()
getindex(n::Nullable) = n.value
eltype{T}(::Nullable{T}) = Union{T,Void} # ??
isnull(n::Nullable) = isa(n.value, Void)

and then use broadcast, "lifting", and other machinery to push the nullables through the system?

andyferris · 2016-12-14T06:41:01Z

I should point out that the Union{T, Null{T}} implementation is not really seriously being considered as far as I understand the growing consensus. Instead, that implementation is being documented to demonstrate why it offers few benefits over our current implementation.

Of course this:

immutable Nullable{T}
    value::Union{T,Null{T}}
end

offers few benefits, and if that is what you meant, then I agree entirely. To be clear, I was suggesting to use "bare" nullables with no wrapper type, to replace DataArrays.NA with Base.Null{T}, and to remove Base.Nullable entirely (except maybe as a typealias for a Union).

…ullable

johnmyleswhite · 2016-12-18T01:26:53Z

Revised the document. Still a bunch more to do (and a few comments left to respond to), but I'm now almost completely convinced we'd want to make ?T mean Union{Some{T}, Void} since the benefits from having T and ?T represent disjoint sets seem to outweigh other considerations.

ararslan · 2016-12-18T19:00:22Z

Nullable.md

+lifting.
+
+The unique concern with `Union{T, Void}` is `f(x::T)` will often be defined
+because it is exists even in the parts of Julia that deal with only


"it is exists"

ararslan · 2016-12-18T19:00:54Z

Nullable.md

+that can take in a function with a method of the form `f(x::T)` and employ a
+default semantics for null values to handle the `f(x::Void)` case. In
+particular, we will refer to the pattern of extending an `f(x::T)` method to
+handle `f(x::Union{T, Void)` by assuming that `f(x::Void) = nothing` as


Missing a }

ararslan · 2016-12-18T19:08:14Z

Nullable.md

+    for x in xs
+        s += x
+    end
+    return x


return s?

ararslan · 2016-12-18T19:11:31Z

Nullable.md

+current `Nullable{T}` implementation as one must determine the type parameter
+`T` even for the null case.
+
+## Step 2:: Define Core Operations on Nullable Types by Lifting


2:: -> 2:?

ararslan · 2016-12-18T19:15:14Z

Nullable.md

+`x ?? y ?? 0` would be equivalent to `hasvalue(x) ? x : (hasvalue(y) ? y : 0)`.
+
+Finally, we might introduce `.?` for lifted field access such that
+`x?.field_name` would be equivalent to `getfield.(x, :field_name)`.


You first mention .? but then use x?.. Which one is being considered? Or are both?

ararslan · 2016-12-18T19:18:21Z

Nullable.md

+following occurs:
+
+```
+julia> Array(String, 1)


That syntax is "deprecated" per the manual (though no warning or anything is emitted). In the now-preferred syntax, it would be Array{String,1}(1).

nalimilan · 2016-12-18T19:19:00Z

Nullable.md

+
+Deciding between these three options will require balancing a variety of
+concerns. The arguments for each option are summarized below. We include all
+of them for clarity, although we believe that the arguments on behalf of


"in favor of" rather?

nalimilan · 2016-12-18T19:21:38Z

Nullable.md

+handle `f(x::Union{T, Void)` by assuming that `f(x::Void) = nothing` as
+lifting.
+
+The unique concern with `Union{T, Void}` is `f(x::T)` will often be defined


I wouldn't say it's the unique concern. The strongest concern for me is that this nullable type wouldn't allow distinguishing between no value and a stored null value when indexing a dictionary.

nalimilan · 2016-12-19T10:24:18Z

Nullable.md

+functions in Base Julia were repeated again when adding methods for `NAtype`:
+instead of describing a universal pattern using a generic abstraction, we
+repeated that pattern manually over and over again, while always discovering
+new cases that had been left out.


This problem of manual redefinitions is not a criticism of the Union{T, Void} approach per se: it would rather justify hard-coding something like (pseudo-code) (::F){F<:Function}(args...) = nothing for all functions. (The ~~next~~ third paragraph below gives reasons why this would not be a good idea, but it's a different argument IMHO.)

nalimilan · 2016-12-19T10:32:09Z

Nullable.md

+
+```
+function sum(xs::Vector{Union{Some{Int}, Void}})
+    s = 0


Should this be Some(0)?

nalimilan · 2016-12-19T10:33:49Z

Nullable.md

+end
+```
+
+What type does this return when `xs = ?Int[]`? What type does it return when


Could you illustrate the alternative solutions? It's not completely clear to me what problem you're describing.

nalimilan · 2016-12-19T10:37:56Z

Nullable.md

+the implementation details of the language. But Julia does not provide such
+guarantees, so type-inference must be invoked to determine information about
+the return type of the counterfactual branch. An alternative approach is to
+use `promote_type` to explicitly register return types, but this is clearly


promote_op? Anyway, I don't think it's worth mentioning as it's clearly not a solution.

nalimilan · 2016-12-19T10:44:18Z

Nullable.md

+implementation that uses distinct sentinel values for each nullable type in
+the language. Otherwise, this approach is similar to the `Union{T, Void}` case
+described above, except for one crucial problem that would seem to suggest this
+approach offers few benefits over our current implementation of `Nullable{T}`.


Should mention possible benefits which justify detailing this approach at all. One benefit I can see is preventing things like Null{String}() + 1, which would most likely occur due to a bug somewhere.

What benefits we expect also matters for discussing the problems with this approach. In particular, when inference fails, we could fall back to Null{Any} or Null{Union{}}. Depending on what that type information is used for, it could be a problem or not.

TotalVerb · 2017-02-27T19:10:52Z

Let me add something to ensure that all potential questions have answers.

One difficulty with that approach is distinguishing when an f(x) = ... definition intends to lift over x::Null or otherwise. For instance, we might want string(Null()) == "Null()". Now if someone defines

foo(x) = length(string(x))

is it clear whether this should be Null() or 6 for foo(Null())?

TotalVerb · 2017-02-27T19:13:59Z

Unrelated to the current matter of discussion; but one thing we should do in 0.6 if at all still possible, is deprecate using the ternary operator without spaces. Otherwise things like

@foo x ?Int :sym

will parse in an unexpected way.

TotalVerb · 2017-02-27T19:32:06Z

There are also other functions which we must consider Null to not lift over. I wonder if maintaining such a list of functions might be just as much work as maintaining a list of functions it should lift over.

string, print, println, show, dump, display etc.; all IO functions
by extension, functions that call string, e.g. lpad. annoyingly, we might want to lift lpad over Null() as the first argument
Base.vect, i.e. [x, y]; also vcat hcat hvcat cat
push! unshift! setindex! fill! etc. in fact basically every function with a bang
in contains
=> e.g. for Dict(x => Null()), Set
isnull, get, filter, map
most (but not all? e.g. Complex) type constructors
ifelse, isa, typeof, Ref, and most other builtins

And then of course a list of functions where the lifting behaviour is ambiguous

searchsorted and friends
|>, ==, !=, isequal
run, close, sleep, wait and other "non-bang but still imperative" functions for which a null argument doesn't seem to make sense

I am sure this list is incomplete; but it must be something to consider.

ararslan · 2017-02-27T19:37:06Z

The more I think about it, the more I think we shouldn't lift by default; cases where f(Null) should not be Null seem to be equally prevalent if not more so than those where it should be Null. (Plus, I don't think Jeff has changed his stance on automatic lifting for Base functions.) It's easy enough to determine the subset of Base functions which should provide lifting, and packages can opt to provide functionality for their functions as well. But making a broad assumption on what f(Null) should be seems a little dangerous.

TotalVerb · 2017-02-27T19:42:53Z

There seems to be some consensus recently of Nullable going back to what it used to be, with NAType, except now with compiler support so it's faster. I wonder if the name should also change to NAType, as this seems in some ways to be moving away from what is called "null" in other programming languages.

ararslan · 2017-02-27T19:56:07Z

I don't think it matters too much how we spell it so long as the behavior is well defined and well documented.

andyferris · 2017-02-27T22:30:34Z

I also think "universal" default definitions (lifting) for Null would be undesirable. I think there are many situations where an error is better than continuing to propagate the Null.

I still believe that Null{T} would be a nicer semantic. Some functions support a wide range of signatures and not having the T means it's ambiguous which method to correspond with (and there may exist important semantic differences between the different method signatures). I feel that would be a real problem and become a major pain point when you are doing anything more complex than basic operations like +. The other nice thing about this that you can (safely) define operations on Null{T} such as &(x::Bool, ::Null{Bool}) = x ? Null{Bool}() : false for e.g. 3 valued logic (and users can do the same safely for a variety of types). Without the T then we should assume the Null value might be of Any type, so I wouldn't like to see 3-valued logic implemented with &(::Bool, ::Null) since users are free to overload the behavior of &(::Bool, ::T) for their own types T.

However, to get this semantic of a missing value of type T, we might benefit from some compiler support (specifically, from the type system). Ideally, Null{T} would be covariant in T to make working with non-concrete types easier and more consistent. For example, if Null{Int} represents some missing value of type Int, then I wonder if this missing value should also be considered of type Null{Integer}? That is, given that types represent sets of possible values, for every (concrete?) type T we create the nullable set by adding an extra member of the set Null{T}() which represents a missing value (let's name here the combined set as type Nullable{T} = Union{T, Null{T}}, though I think we should find a new name like ?T or something). We could imagine the set Nullable{S} (where S is an abstract supertype of T) as including the value Null{T}() (as well as Null{S}()?). (~~Does this make any sense, @JeffBezanson?~~ no it does not)

If we don't get this behavior then a signature like f(::Nullable{Integer}) would fail for Null{Int}() and I do not believe that accurately captures the intentions of the author of f. If we did implement this covariant behavior, we do get that Null (being Null{T} where T) and Null{Any} are more-or-less equivalent (as may be Any and Nullable{Any}...), so in many cases ::Null would still be useful to put in signatures and there shouldn't be undue burden on users/package authors unless they actually require the type specificity to make dispatch consistent.

EDIT: What I said here about covariance of Null{T} is rubbish. Null{<:T} seems to have all the desired properties, e.g. Null{<:Integer} includes Null{Int64}(), Null{Unsigned}(), Null{Integer}(), Null{Union{UInt8, Int8}}(), etc.

ararslan · 2017-02-27T22:52:49Z

@andyferris I don't see how Null{T} would differ significantly, or in any beneficial way, from the current Nullable{T}.

Some functions support a wide range of signatures and not having the T means it's ambiguous which method to correspond with

That's why you define proper promotions in Base and locally define f(::Null) to be whatever makes sense. If not knowing the type when calling a method on a null value is a problem, I think that's more indicative of a design flaw.

andyferris · 2017-02-27T23:14:31Z

If not knowing the type when calling a method on a null value is a problem, I think that's more indicative of a design flaw.

Sorry, some functions are overloaded with lots and lots of methods on purpose, e.g. try methods(+). Making use of multiple dispatch is the whole point of Julia, and null values of certain types will follow distinctive propagation rules. If we allow for e.g. 3-valued logic,

&(x::Bool, ::Null) = x ? Null() : false
&(::Null, y::Bool) = y ? Null() : false
&(::Null, ::Null) = Null()

and another user creates a type T which

overloads &
interacts with Bool, such as &(::Bool, ::T) -> T
they want to use Union{T,Null} and propagate it through &(::Union{Bool,Null}, ::Union{T,Null})

then they will not ever be able to stop &(false, Null) from being false::Bool instead of some value of T or Null.

I would prefer we didn't add things which constrain multiple dispatch and user's ability to overload methods, whenever they decide they also need to use missing values. In the example above, the user would need to define another &-like function instead of adding a method to &. It seems, to me, to be against the Julian way.

I do agree that for typical, database stype operations, you're not going to run into this esoteric set of conditions. But the fact is that we can foresee circumstances where using the type and dispatch system more strongly will lead to bad interactions with Null.

andyferris · 2017-02-27T23:20:21Z

In short terms, if a user defines T such that &(::Bool, ::T) -> T, they might expect to be able to enforce that &(::?Bool, ::?T) -> ?T. However, if anyone implements 3-valued logic on Bool, they won't be allowed to do that.

ararslan · 2017-02-27T23:29:55Z

That doesn't seem like a problem to me; that's just kind of how I expect Unions to work. The behavior is clear when you think in terms of the methods defined for the given function. We just have to make sure the behavior is well documented, then people will know they can't expect to be able to enforce Union{Bool, Null} & Union{T, Null} = Union{T, Null}.

andyferris · 2017-02-27T23:45:52Z

That doesn't seem like a problem to me; that's just kind of how I expect Unions to work.

Of course, this is exactly how Union{T,Null} would work. I was just suggesting that Union{T,Null{T}} would follow the same Union rules but is more flexible for users.

Do note that the covariance idea means that 99.9% of the time you can just use Null and ?T for dispatch, and any users that want Union{Bool, Null{Bool}} & Union{T, Null{T}} = Union{T, Null{T}} can still get that. It just seems like a more flexible solution, is all. The other problem with untyped Null and ?T = Union{T,Null} is that dispatch would have many ambiguities if users use ::?T in signatures. I can imagine that I'd like to write:

f(x::?T1) = if isnull(x) ? #= do a =# : #= do b =#  # note that branch is elided by compiler!
f(x::?T2) = if isnull(x) ? #= do c =# : #= do d =#
f(Null) # ambiguous

That basically means it's never safe to dispatch on ?T since two users/packages (who might not be collaborating) might both overload Base.f(::?T) for same f and different T, and end up with this ambiguity problem. This seems (to me at least) as a major flaw.

I don't see how Null{T} would differ significantly, or in any beneficial way, from the current Nullable{T}.

I'm arguing that there is (or that there might be) a benefit for Null{T} over Null. I'm wondering what people see as the downsides (especially w.r.t. dispatching on ?T)?

EDIT: please replace "covariance idea" with "using Null{<:T}" when reading the above.

andyferris · 2017-02-27T23:50:43Z

I don't see how Null{T} would differ significantly, or in any beneficial way, from the current Nullable{T}.

I'm arguing that there is (or that there might be) a benefit for Null{T} over Null. I'm wondering what people see as the downsides (especially w.r.t. dispatching on ?T)?

Or to answer your question another way, the only problem I saw with Nullable was that it is a container. Null{T}() is not a container, it's a value, just as Null() would be.

What are the problems that Null{T}() inherits from Nullable{T}()? Is it that you might need to sometimes use promote_op and so-on? (I actually think you won't need that nearly as much for Null{T} as you would for Nullable{T}, since you can just treat it as a value. Only containers of ?T would use promote_op just as they do for T itself).

johnmyleswhite · 2017-02-28T00:11:32Z

The relevant problem with Nullable{T} is also the problem with Null{T}: they're parametric types. Being a container or a union is not the problem: the problem is computing the type parameters of a parametric type in the absence of type inference.

andyferris · 2017-02-28T00:21:03Z

The relevant problem with Nullable{T} is also the problem with Null{T}: they're parametric types. Being a container or a union is not the problem: the problem is computing the type parameters of a parametric type in the absence of type inference.

Actually it's only parametric containers that rely on inference.

You can for instance define +(::Null{Int32}, ::Null{Int64}) = Null{Int64}(). No inference required. Just dispatch.

You could use promote or promote_op(+, ...) to have a generic method for +(::Null{T1}, Null{T2}), but unlike the container situation, you are not forced to, and I would suggest we don't use promote_op (promote might be OK, wherever it is already used?).

johnmyleswhite · 2017-02-28T00:33:20Z

I give up. I'm done having these debates when no one involved is paying me for my time.

andyferris · 2017-02-28T00:36:20Z

John - please don't take me the wrong way. We all really appreciate the effort that is going into this - I think it is really fantastic!

(I'm only trying to bounce some ideas around and have a technical discussion about some things which I thought might have been overlooked)

TotalVerb · 2017-02-28T00:40:18Z

@johnmyleswhite I too greatly appreciate the effort that has gone into this Julep. Thanks for all your hard work; it's truly important for building a strong foundation for 1.0.

@andyferris There is a flaw, in my opinion, with the covariant Null{T} idea. It makes the assumption that every type on its own represents a coherent algebraic structure, to which we can add a unique null type. By making Null{T}() an instance iff T is concrete, we lose the ability to adjoin a null element to algebraic structures which are, for whatever reason, represented by an abstract type.

Consider for example

abstract type Symbolic <: Real end
struct SymInteger <: Symbolic; val::BigInt; end
struct Reciprocal <: Symbolic; of::Symbolic; end

and imagine that, conventionally, Symbolic is used to denote some real number represented symbolically in this fashion. Then, what we desire is a Null{Symbolic}() constant. But with Null{T}() existing iff T concrete, what we instead have are two disjoint Null{SymInteger}() and Null{Reciprocal}() types, which does not seem fun to have to deal with.

andyferris · 2017-02-28T00:46:44Z

@TotalVerb I think you might be right (I think there was a ? in my post next to the "concrete"). Can we have consistent covariance (Null{SymInteger}()::Null{Symbolic}) as well as a Null{Symbolic}() value?

TotalVerb · 2017-02-28T00:54:56Z

Hmm, that would be unusual compared to anything else in the language (it would be a type which has an instance, but also has subtypes), but I suppose there is nothing preventing its implementation. Moving on to practical issues though, as mentioned in the Julep, consider

function map(f, x::Nullable{T})
    if !isnull(x)
        return Nullable(f(get(x)))
    else
        return Nullable{Core.Inference.return_type(f, (T, ))}()
    end
end

If we rewrite this with Union{T, Null{T}} we obtain

map(f, x::Any) = f(x)
map{T}(f, x::Null{T}) = Null{Core.Inference.return_type(f, Tuple{T})}()

which not only has an inference dependency, but also has a strange definition of map. It seems like we would be giving up on map entirely with Union{T, Null{T}}. This would be a departure from Scala, Java, Rust, F#, Swift, Kotlin, {Ca|Standard}ML. While I'm not saying that is necessarily a bad thing, we should consider why map is considered useful in such languages, and see if we have a replacement for those use cases.

TotalVerb · 2017-02-28T01:04:01Z

An instructive exercise, perhaps, is to find some concrete examples of code that uses Nullables in practice, and consider how it would be written using each suggested interpretation. The applications in NullableArrays are well-known and of course well-studied, but we should consider other applications also.

Let me get the ball rolling with some parts of Base Julia that use Nullable or an equivalent type.

match in Base returns nothing when there is no match. (This is effectively Union{T, Void} or Union{T, Null}.) As far as I can tell, any code using conditional regex matches is practically forced to use an if statement, because there is neither autolifting nor opt-in lifting for the Void in `Base.
get(dict::Associative, key) as proposed in Add get(coll, key) for Associative returning Nullable julia#18211. In particular, this is not only a convenience issue, but also a performance one, because this can avoid the double hash lookup.
The revised iteration protocol gives yet another perspective.

I suspect that a lot of these use cases will be relatively independent of which interpretation is chosen, but we should also consider package code that makes use of nullable.

andyferris · 2017-02-28T01:14:17Z

Hmm, that would be unusual compared to anything else in the language (it would be a type which has an instance, but also has subtypes), but I suppose there is nothing preventing its implementation.

Actually that does seem like a really big departure from the current type system.

If we rewrite this (map) with Union{T, Null{T}} we obtain

Do people need map(f, x::Union{T, Null}) for values? If T is not a container; you just call f(x). Of course, you can do map(sin, 1) and for the null case we can make Null an iterable of length 1, just like Int.

If T is a container such as T <: AbstractArray, say, then we'd need an extra method for map for that, and it might invoke inference on the element types if that is what it does now for empty arrays anyway... seems equivalent.

TotalVerb · 2017-02-28T01:19:42Z

I think you're right. map is for containers; Union{T, Null{T}} is not a container and so makes it not generally necessary.

However, this goes back to the original issue of when functions should be lifted over null arguments. If I want to compute f.(x::Nullable{T}), whether or not the person who defined function f defined f(::Null{T}), there should still be a way to do that; otherwise we are losing the functionality of map. This is, modulo the inference dependency, I think fairly easy to resolve:

lift(f, x) = f(x)
lift(f, x::Null{T}) = Null{C.I.return_type(f, Tuple{T})}()

However, because there is no more Some{T} tag indicating that something was part of a Nullable, this means that we need to use a new syntax, and can no longer reuse the broadcast dot syntax.

andyferris · 2017-02-28T03:16:10Z

Thank you very much @TotalVerb for helping me figure out the trade-offs here. It seems true that the Null{T} approach does seem to preclude "lifting" methods (or entire functions) without performing a type inference step.

My general approach here has been to see if sufficiently clever dispatch would mean lifting is not necessary - and demanding that users/package authors provide the necessary method. I will concede that this might not be very user-friendly, especially in the case that the methods are coming from somebody else's package. In any case, I would suggest providing a range of common methods for Null or Null{T} so that lifting is relatively rare. And, if lifting is relatively rare it (speculatively) seems not too ridiculous to invoke inference in these cases? I dunno.

Anyway, it seems that Null is better suited for an explicitly annotated lifting approach and Null{T} is more geared towards dispatch.

andyferris · 2017-02-28T03:20:28Z

By the way, everything I was saying about the covariance of Null{T} is rubbish. In Julia v0.6 it's dead easy to express Null{<:Integer} which includes Null{Int64}(), Null{Unsigned}(), Null{Integer}(), Null{Union{UInt8, Int8}}(), etc.

Thus I would suggest having ?T = Union{T, Null{<:T}} (or whatever the syntax for the left-hand side here becomes). Then dispatch for f(::?Integer) would be quite sensible. No special compiler/type-system treatment required.

nalimilan · 2017-02-28T09:37:49Z

My general approach here has been to see if sufficiently clever dispatch would mean lifting is not necessary - and demanding that users/package authors provide the necessary method. I will concede that this might not be very user-friendly, especially in the case that the methods are coming from somebody else's package. In any case, I would suggest providing a range of common methods for Null or Null{T} so that lifting is relatively rare. And, if lifting is relatively rare it (speculatively) seems not too ridiculous to invoke inference in these cases? I dunno.

@andyferris This approach has already been tried with vectorized methods, and it was considered as a design flaw. In 0.6 we've been able to move away from it by using the f.(x) syntax for element-wise operation. So we've followed that route already and we know it won't work. If a user needs to pass a null values to a function provided by a package which doesn't support it, s/he shouldn't need to rewrite the function. I think repeating these discussions once again is one of the reasons John closed the PR.

Also the issue with inference is a deep one. Even if we get rid of map support for nullables, lifting will have by definition to infer the T for Null{T} if we make it parametric. It's not OK for it to fail in some cases, as these cases will behave differently from others if you dispatch on the inferred type. What's the point of opting for Null{T} if we can't rely on it? Note that the case of empty collections is considered as fragile, but we haven't found a better solution. Making it the foundation of nullables isn't the best idea to write a robust system.

That basically means it's never safe to dispatch on ?T since two users/packages (who might not be collaborating) might both overload Base.f(::?T) for same f and different T, and end up with this ambiguity problem. This seems (to me at least) as a major flaw.

Yes, and that shows the only solution in that case is to have a common definition f(::Null) = Null(). That's why I suggest we do this by default for all functions.

(The example of 3VL & and | is not representative at all, since these are the only two functions I know for which we want lifting, but not with the standard lifting semantics. So let's not base the whole system on this special case, for which it could be OK to define methods manually.)

Finally, I'm not sure we need so many functions not to be lifted. Functions for which passing a nullable makes no sense should generally not be lifted, and people shouldn't pass nullables to them. Why would one call lpad(Null()) or run(Null()) and expect not getting Null() as a result?

Anyway, we could start without a default lifting fallback and see how it goes. At least, Union{T, Null} would be as functional as DataArrays or Nullable, but more convenient than the latter since non-null values would not even need unwrapping.

andyferris · 2017-02-28T10:55:35Z

Thank you so much @nalimilan for your wonderful and very detailed response. I have certainly learned a lot today from this discussion. I had been interested in the development of Nullable and for quite some time have been following the public discussions, issues and this PR as best I can, yet I always was left with (central) one question: "why use lifting with Union-based nullables?" (as opposed to dispatch, which does require extra method definitions and certainly inconvenience, but nothing worse than e.g. C++ would be). I am very glad for receiving a full and direct explanation.

In any case, I feel that having this discussion has been valuable exercise. This certainly is a tricky design problem, and there are clearly rather few available options which satisfy e.g. determinism w.r.t. inference and user-friendliness.

@johnmyleswhite I'd like to apologize to you directly - as I alluded to earlier, I have been very impressed with the leadership shown with this PR and, in fact, have been looking forward to seeing this PR merge. It was never my intention to derail this process. However, I think it's reasonable to say this Julep has become a focus point for the future of Nullable and I assumed it would be an OK place to check my understanding and assumptions about this design, clear up some misunderstandings, float ideas, and in general, to come to a consensus. Thank you for your efforts - and I still believe this Julep process has been, and will continue to be, very worthwhile.

nalimilan · 2017-02-28T12:56:14Z

I think we should merge this PR and add the missing details later.

Add a draft Nullable Julep

04311d2

quinnj reviewed Dec 10, 2016

View reviewed changes

ararslan reviewed Dec 10, 2016

View reviewed changes

nalimilan reviewed Dec 10, 2016

View reviewed changes

tkelman added the Nullable label Dec 12, 2016

andyferris reviewed Dec 14, 2016

View reviewed changes

johnmyleswhite added 2 commits December 17, 2016 16:45

Add a draft Nullable Julep

2f88621

Merge branch 'jmw/nullable' of github.com:JuliaLang/Juleps into jmw/n…

60afcc2

…ullable

ararslan reviewed Dec 18, 2016

View reviewed changes

Nullable.md

for x in xs

s += x

end

return x

Copy link

Member

ararslan Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return s?

ararslan reviewed Dec 18, 2016

View reviewed changes

nalimilan reviewed Dec 18, 2016

View reviewed changes

nalimilan reviewed Dec 19, 2016

View reviewed changes

Nullable.md

```

function sum(xs::Vector{Union{Some{Int}, Void}})

s = 0

Copy link

Member

nalimilan Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Some(0)?

nalimilan reviewed Dec 19, 2016

View reviewed changes

johnmyleswhite closed this Feb 28, 2017

nalimilan mentioned this pull request Jul 4, 2017

The fate of Nullable JuliaLang/julia#22682

Closed

[WIP] Nullable Redesign #21

[WIP] Nullable Redesign #21

Conversation

johnmyleswhite commented Dec 10, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyferris left a comment

Choose a reason for hiding this comment

andyferris Dec 14, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyferris commented Dec 14, 2016 • edited Loading

johnmyleswhite commented Dec 14, 2016

andyferris commented Dec 14, 2016 • edited Loading

andyferris commented Dec 14, 2016

johnmyleswhite commented Dec 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan Dec 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TotalVerb commented Feb 27, 2017

TotalVerb commented Feb 27, 2017

TotalVerb commented Feb 27, 2017

ararslan commented Feb 27, 2017

TotalVerb commented Feb 27, 2017 • edited Loading

ararslan commented Feb 27, 2017

andyferris commented Feb 27, 2017 • edited Loading

ararslan commented Feb 27, 2017

andyferris commented Feb 27, 2017

andyferris commented Feb 27, 2017

ararslan commented Feb 27, 2017

andyferris commented Feb 27, 2017 • edited Loading

andyferris commented Feb 27, 2017 • edited Loading

johnmyleswhite commented Feb 28, 2017

andyferris commented Feb 28, 2017 • edited Loading

johnmyleswhite commented Feb 28, 2017

andyferris commented Feb 28, 2017

TotalVerb commented Feb 28, 2017

andyferris commented Feb 28, 2017

TotalVerb commented Feb 28, 2017

TotalVerb commented Feb 28, 2017

andyferris commented Feb 28, 2017 • edited Loading

TotalVerb commented Feb 28, 2017

andyferris commented Feb 28, 2017

andyferris commented Feb 28, 2017 • edited Loading

nalimilan commented Feb 28, 2017

andyferris commented Feb 28, 2017

andyferris Dec 14, 2016 •

edited

Loading

andyferris commented Dec 14, 2016 •

edited

Loading

andyferris commented Dec 14, 2016 •

edited

Loading

nalimilan Dec 19, 2016 •

edited

Loading

TotalVerb commented Feb 27, 2017 •

edited

Loading

andyferris commented Feb 27, 2017 •

edited

Loading

andyferris commented Feb 27, 2017 •

edited

Loading

andyferris commented Feb 27, 2017 •

edited

Loading

andyferris commented Feb 28, 2017 •

edited

Loading

andyferris commented Feb 28, 2017 •

edited

Loading

andyferris commented Feb 28, 2017 •

edited

Loading