Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReshapedArrays, take2 #15449

Merged
merged 5 commits into from
Apr 20, 2016
Merged

ReshapedArrays, take2 #15449

merged 5 commits into from
Apr 20, 2016

Conversation

timholy
Copy link
Member

@timholy timholy commented Mar 11, 2016

This contains my current implementation of ReshapedArrays. It's built on top of #15357; only the last commit is new.

This provides two means of access: the traditional R[i,j,k], and an iteration-specific R[I] where I is a special index that "pops up" to the parent array. This avoids the cost of calculating a div, which even with #15357 is somewhat slow.

Currently, this is for reference only; there is more work that needs to be done on our iteration paradigms before this will pass tests.

@timholy timholy mentioned this pull request Mar 11, 2016
@timholy timholy force-pushed the teh/reshapedarrays2 branch 2 times, most recently from 10c6c99 to 35777df Compare March 21, 2016 12:05
@timholy
Copy link
Member Author

timholy commented Mar 21, 2016

OK, this is getting more serious now. I'm deliberately taking the "hard-core" road, deleting virtually all other reshape methods, to discover what kinds of problems it causes. It passes most tests, with a few failures for reasons I largely understand. The most common source of trouble is the fact that reshape(1:20, 4, 5) is now immutable, and this requires the usage of copy in tests that depend on the former behavior of reshape (always returning a mutable array). One of the failures is this test, which (if one thinks that reshape should always create a view) seems like it will have to go. Several of the remaining failures will be best addressed by having comprehensions return arrays of the same shape as the inputs (without calling reshape). Another (in sparse) is related (but not identical) to #15354 (comment).

@mbauman, I reworked checkbounds and switched to using an Int state variable when iterating over LinearFast arrays (each change in a separate commit).

Perhaps of greatest interest is a performance benchmark in this gist. The first line corresponds to uncommenting a new definition of eachindex in reshapedarray.jl, which leads to a significant performance improvement but also will cause a number of failures until the tasks in #15434 (comment) are done.

Here are the performance results:

julia> include("perf_reshaped.jl")
  0.000006 seconds (159 allocations: 11.137 KB)
reshaped linearslow, integers
  1.206878 seconds (5 allocations: 176 bytes)
reshaped linearslow, linear indexing
  1.186159 seconds (5 allocations: 176 bytes)
reshaped linearslow, eachindex
  0.278382 seconds (5 allocations: 176 bytes)
original subarray, integers (linearslow)
  0.107050 seconds (5 allocations: 176 bytes)
original subarray, eachindex (linearslow)
  0.212035 seconds (5 allocations: 176 bytes)

reshaped linearfast, integers
  0.109079 seconds (5 allocations: 176 bytes)
reshaped linearfast, linear indexing
  0.110559 seconds (5 allocations: 176 bytes)
reshaped linearfast, eachindex
  0.109705 seconds (5 allocations: 176 bytes)
original array, integers
  0.108872 seconds (5 allocations: 176 bytes)
original array, linear indexing
  0.108599 seconds (5 allocations: 176 bytes)
original array, eachindex
  0.110465 seconds (5 allocations: 176 bytes)

It seems that CartesianIndex iteration performance has degraded a bit (I haven't yet had a chance to see why), but the problems pale next to the performance challenges for LinearSlow ReshapedArrays when not using the eachindex strategy. This is largely fixed by the eachindex strategy, but it requires that #15434 be brought to completion. I'm worried this will be ugly in some cases (particularly for linear algebra), and relies on packages following suit, so I'd be interested in learning whether others have better ideas.

CC @meggart, @dfdx, @kswietli, @dcarrera, @Musmo.


return SparseMatrixCSC(mS, nS, colptr, rowval, nzval)
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is one we should surely revert before merging this. But it's a good test of generality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, scratch that. Unlike reshape methods for BitArray and SharedArray, this copies the data. So keeping this is probably not on the table, if we're serious about having reshape always return a view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is akin to the sparse views problem. It's inconsistent, but I think you should restore this to the copying method for now. Making reshape default to a view is still a huge improvement over the status quo, even if we don't get to 100% consistency.

It may not be all that terrible to just have sparse reshapes and slices always return copies as an exception to the rule. Heck, NumPy gets away with their fancy-indexing-returns-a-copy, but I have no idea how error prone that is in practice.

@mbauman
Copy link
Member

mbauman commented Mar 25, 2016

Crazy half-baked thought: what if we expose this functionality as a new index type instead of a new wrapper? With APL indexing (#15431), we can use specialized multidimensional arrays as a mapping from N indices to an Int or a CartesianIndex{M} as SubArray indices. Then reshape could return a subarray with a ReshapedIndices index. Non-scalar indexing into the resulting reshaped arrays could just wrap a SubArray around these indices as they're re-indexed.

immutable ReshapedIndices{N,M} <: AbstractArray{CartesianIndex{M}, N}
    dims::NTuple{N, Int}
    mi::NTuple{M, SignedMultiplicativeInverse{Int}}
end
function ReshapedIndices(dest_size::Dims, src_size::Dims)
    strds = front(tail(size_strides((1,), src_size...)))
    mi = map(SignedMultiplicativeInverse, strds)
    ReshapedIndices(dest_size, mi)
end
Base.size(ri::ReshapedIndices) = ri.dims
Base.getindex(A::ReshapedIndices, indexes::Int...) = CartesianIndex(ind2sub_rs(A.mi, sub2ind(size(A), indexes...)))

julia> R = ReshapedIndices((6,2), (4, 3))
6x2 ReshapedIndices{2,1}:
 CartesianIndex{2}((1,1))  CartesianIndex{2}((3,2))
 CartesianIndex{2}((2,1))  CartesianIndex{2}((4,2))
 CartesianIndex{2}((3,1))  CartesianIndex{2}((1,3))
 CartesianIndex{2}((4,1))  CartesianIndex{2}((2,3))
 CartesianIndex{2}((1,2))  CartesianIndex{2}((3,3))
 CartesianIndex{2}((2,2))  CartesianIndex{2}((4,3))

Of course, this doesn't address the linear fast or iteration special cases that you've worked so hard on getting good performance with… but I think those can just be expressed as dispatch specializations on the SubArray index tuple.

I think the hardest part would be reshapes of SubArrays; the naive merge_indexes algorithm leaves a lot to be desired (and would allocate for lots of indices).

@mbauman
Copy link
Member

mbauman commented Mar 25, 2016

I think the hardest part would be reshapes of SubArrays; the naive merge_indexes algorithm leaves a lot to be desired (and would allocate for lots of indices).

And the solution there would be to make merge_indexes lazy, too. The naive version is currently a comprehension: [CartesianIndex(map(getindex, indexes, ind2sub(dimlengths, i))) for i in index], but that could just as easily be made into a lazy AbstractArray and sped up with MultiplicativeInverses.

@timholy
Copy link
Member Author

timholy commented Mar 25, 2016

Yes, I agree that a lazy merge_indexes is the way forward. Indeed, this PR could already delete merge_indexes, I don't know why I didn't do that already. Instead it would return a SubArray{ReshapedArray{A}}. However, this case, and ReshapedArray{SubArray{A}}, are the two cases that introduce performance nightmares when A is LinearSlow. (Interestingly, even if SubArray{ReshapedArray{A}} is LinearSlow, as long as A is LinearFast there's no problem. Sadly, ReshapedArray{SubArray{A}} does not work out as nicely---the problem is basically that ReshapedArray{B}, for B LinearSlow, is tough.)

Given this line of thinking: I would posit that your ReshapedIndexes typestoring a pre-computed index-mapping array is basically the non-lazy merge_indexes all over again, so I confess to not being very enthusiastic about that approach. It could allocate more storage than needed for the whole array. I agree it's a great way to solve some usage performance problems, but it's at the cost of some construction performance and storage problems. I want it all, baby 😄.

@mbauman
Copy link
Member

mbauman commented Mar 25, 2016

I may be missing something, but I'm pretty sure that this is almost isomorphic to your current design. The sketch of the ReshapedIndices type doesn't pre-compute the mapping — it does so lazily on-demand. I just stripped out the index part of the getindex definition from this PR.

Now, I don't think it'll gain us anything major, but I think there are a few advantages here:

  • This allows us to effectively hide the nesting within SubArray's indices as an implementation detail.

SubArray{T,N,ReshapedArray{T,N,A},I} becomes SubArray{T,N,A,Tuple{SubArray{T,N,ReshapedIndices,I}}.
ReshapedArray{T,N,SubArray{T,N,A,I}} becomes SubArray{T,N,A,Tuple{SubArray{T,N,MergedIndices(I),Tuple{ReshapedIndices}}}.

  • Users wouldn't ever need to know about ReshapedIndices (eyes gloss over before looking that deep into nested types), but they'd frequently see ReshapedArrays.
  • It keeps all this index munging together in one place: the indices of SubArrays. With APL indexing, we already need to think about reshaping indices in order to ensure the correct size result (https://github.com/JuliaLang/julia/pull/15431/files#diff-89e97f67f8058d32eac8d917db5bbdbfR441).
  • We could even put a threshold on the number of indirections between merged and reshaped indices, and make a copy of the indices at some point without messing up the view semantics of the actual array.

Now, the only difference that I see potentially causing trouble is that this would lean more on CartesianIndexes, since SubArray works as a lazy mapping that transforms the passed indices into the parent indices, processing one index at a time. With support for APL indexing, we can use N-dimensional arrays to allow a mapping from N indices (taken as a single group) to a single index. This is where we'd use a CartesianIndex instead of splatting for LinearSlow arrays.

@timholy
Copy link
Member Author

timholy commented Mar 25, 2016

Ah, I misunderstood your main intent; I thought you meant that we should store R as a means of solving the performance problems I'm struggling with right now. I think I understand your meaning better now, as an alternate implementation of the same core ideas.

Sure, I'd be fine with a semi-flattening of the hierarchy, putting the index gymnastics into the indexes tuple. There's a certain logic to that. If one did B = reshape(sub(reshape(A, ...))) or sub(reshape(sub(A, ...))), it seems this requires a SubArray{SubArray}? Currently we flatten that condition, but of course we couldn't if there's an intervening reshape. I suppose this is part of the reason that I defined a separate view type, but I'm not attached to that design.

I'm not sure I yet see dramatic advantage in your design, but the bottom line is that I'd be fine in principle with an isomorphic design---or more properly, ANY design that works! At the moment, right now I'm pretty focused on (aka, worried about) efficient iteration and generic code; I would guess we're facing the same obstacles with either design, unless I'm completely missing an insight you've had.

@mbauman
Copy link
Member

mbauman commented Mar 25, 2016

Yup, exactly. I'd feel stronger if there were more cases where sub(reshape()) or reshape(sub()) could be recomputed into a single index type with just one redirection — this could easily be expressed as dispatch on indexing into ReshapedIndices — but I think that's only possible with all colon indices. I do like the fact that users would only need to deal with SubArrays, but as it stands, don't let this idea get in your way.

@timholy
Copy link
Member Author

timholy commented Apr 1, 2016

I have a sense that some folks seem to be making a bit of an effort to move towards "end-game" on 0.5. If that's true, a key question is: how to think about this PR? In particular, how worried should I be about the remaining cases of poor performance? I've been taking the stance that we really need to solve essentially "all" of the troublesome cases before merging, which basically means solving #15648. That's a big job. (It's a good one, but big.) However, the cases of bad performance are, arguably, a relatively small subset: the poster child is reshape(sub(A, indexes...), sz) and then using that in, e.g., linear algebra. Not even all such cases are problematic, since some subarrays are LinearFast, and that case is fine.

Especially if we're unlikely to return views-by-default from A[2:end,:], then I'd bet that most uses of reshape will be LinearFast and will have excellent performance. So, if folks are eager to move towards releasing 0.5, one option would be to resolve the remaining test failures and performance regressions here and merge it. I think there are only a small handful of failures left, and probably only the additional regression in checkbounds, with likely clear paths to fixing all of these. So I suspect this could be done pretty soon, if there's pressure to do so and a willingness to tolerate some regressions in a minority (?) of cases.

@timholy
Copy link
Member Author

timholy commented Apr 19, 2016

runbenchmarks(ALL, vs = "JuliaLang/julia:master")

@timholy
Copy link
Member Author

timholy commented Apr 19, 2016

OK, this seems ready to go, pending the benchmark results and a merge conflict that seems to have arisen since this morning. @mbauman, I think I addressed all your review comments:

  • Consistent with the goal of having reshape always share data, I made reshape(::SparseMatrixCSC) return a view. But I managed to preserve prior work that's gone into efficient copy-reshaping by having that code called via a specialized copy(::ReshapedArray{T,N,SparseMatrixCSC...}) method. So folks can get the performance they want, if they know to call copy after reshaping.
  • I backed out the change to LinearFast indexing, and instead fixed the checkstring functions (a much better way to handle this).
  • I think I successfully fought my way through checkbounds; no generated functions, and if anything we seem to be slightly faster. We'll see if the benchmarks support this guess. For best performance I had to add some fixed-argument specializations, which proved to be even faster than using a @generated function.

Finally, while I've been vocal about some slow corner cases, it's worth pointing out that this isn't all doom and gloom. In particular, if you must use a linear index with a LinearSlow array, now your best bet is to reshape it first:

julia> function badsum(A)
           s = 0.0
           for i = 1:length(A)
               s += A[i]
           end
           s
       end
badsum (generic function with 1 method)

julia> A = rand(1000,1001);

julia> S = sub(A, 1:999, 1:998);

julia> R = vec(S);

julia> typeof(R)
Base.ReshapedArray{Float64,1,SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}

# after suitable warmup
julia> @time badsum(S)
  0.030715 seconds (5 allocations: 176 bytes)
498730.01196278195

julia> @time badsum(R)
  0.011148 seconds (5 allocations: 176 bytes)
498730.01196278195

Something on the scale of a 3x improvement, mostly thanks to #15357. This is still roughly 7x slower than with a LinearFast array, and about 3-4x slower than using eachindex iteration with S, but a big improvement nonetheless.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy timholy force-pushed the teh/reshapedarrays2 branch from 866cd4d to 5f7e1cb Compare April 20, 2016 09:09
@timholy timholy changed the title WIP: ReshapedArrays, take2 ReshapedArrays, take2 Apr 20, 2016
@timholy timholy merged commit 97f80c3 into master Apr 20, 2016
@timholy timholy deleted the teh/reshapedarrays2 branch April 20, 2016 13:31
@mbauman
Copy link
Member

mbauman commented Apr 20, 2016

Amazing work, Tim! This looks great.

@JeffBezanson
Copy link
Member

🎉


@inline setindex!(A::ReshapedArrayLF, val, index::Int) = (@boundscheck checkbounds(A, index); @inbounds parent(A)[index] = val; val)
@inline setindex!(A::ReshapedArray, val, indexes::Int...) = (@boundscheck checkbounds(A, indexes...); _unsafe_setindex!(A, val, indexes...))
@inline setindex!(A::ReshapedArray, val, index::ReshapedIndex) = (@boundscheck checkbounds(parent(A), index.parentindex); @inbounds parent(A)[index.parentindex] = val; val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line lengths in this file are a tad out of hand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like maybe "functions containing multiple statements should use the long form function definition syntax" would be a good style guideline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #15973.

@yuyichao
Copy link
Contributor

yuyichao commented May 8, 2016

I believe this broke many doctests. One example from http://julia.readthedocs.io/en/latest/manual/interfaces/#abstract-arrays

julia> immutable SquaresVector <: AbstractArray{Int, 1}
           count::Int
       end

julia> Base.size(S::SquaresVector) = (S.count,)

julia> Base.linearindexing(::Type{SquaresVector}) = Base.LinearFast()

julia> Base.getindex(S::SquaresVector, i::Int) = i*i

julia> s = SquaresVector(7)
7-element SquaresVector:
  1
  4
  9
 16
 25
 36
 49

julia> s \ rand(7, 2)
ERROR: MethodError: no method matching qrfact(::Base.ReshapedArray{Int64,2,SquaresVector,Tuple{}}, ::Type{Val{true}})
Closest candidates are:
  qrfact{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64}}(::Union{Base.ReshapedArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2},SubArray{T<:Union{Complex{Float32},Complex{Float64},Float32,Float64},2,A<:Union{Base.ReshapedArray{T,N,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray},I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex,Base.NoSlice,Colon,Int64,Range{Int64}},N}},L}}, ::Any)
  qrfact{T}(::Union{Base.ReshapedArray{T,2,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray{T,2},SubArray{T,2,A<:Union{Base.ReshapedArray{T,N,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray},I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex,Base.NoSlice,Colon,Int64,Range{Int64}},N}},L}}, ::Any)
  qrfact(::SparseMatrixCSC{Tv,Ti<:Integer}, ::Type{Val{true}})
 in \(::Base.ReshapedArray{Int64,2,SquaresVector,Tuple{}}, ::Array{Float64,2}) at ./linalg/generic.jl:347
 in \(::SquaresVector, ::Array{Float64,2}) at ./linalg/generic.jl:350
 in eval(::Module, ::Any) at ./boot.jl:230

There are also other places in the doc that uses reshape on a range and some of them assumes the resulting array is mutable. (Probably just a matter of calling collect for those.)

@andreasnoack
Copy link
Member

This is because the old reshape promoted the SquareVector{Int} to a Matrix{Int} and now it's a ReshapedArray{☠☠☠} which is not a StridedMatrix. A solution that I don't like would be to loosen the signature of qrfact to AbstractMatrix. It would challenge for the promotion rules in LinAlg quite a bit because the factorization are supposed to work in place and AbstractMatrix doesn't provide that garantee. We would need something like convert(AbstractMatrixButWithSetindexWorkingForAllElements, A::AbstactMatrix) which would have to be defined for all matrix types.

@vtjnash
Copy link
Member

vtjnash commented May 8, 2016

AbstractMatrixButWithSetindexWorkingForAllElements

Is that what "similar" is supposed to be?

@andreasnoack
Copy link
Member

andreasnoack commented May 8, 2016

That is sometimes what it means and we might need to use that more systematically in the LinAlg code anyway for e.g. something like XTriangular.

@timholy
Copy link
Member Author

timholy commented May 8, 2016

Rats, I didn't think to run the doctests.

similar yields an uninitialized array. I think the best option here is to introduce a Mutability trait.

@timholy
Copy link
Member Author

timholy commented May 8, 2016

Wow, I can't even run the doctests locally; I got a python error in linalg.rst, and manual/metaprogramming.rst is hanging for me. It doesn't even get to the test you flagged. Any tips?

tim@cannon:~/src/julia-0.5$ python --version
Python 2.7.6

@timholy
Copy link
Member Author

timholy commented May 8, 2016

@andreasnoack, now I see there are already qrfact methods that copy. Can you be more precise about your reason not to generalize those?

@andreasnoack
Copy link
Member

We need to promote to a mutable array with an element type that is stable under the necessary arithmetic operations of the factorizations. So far the promotion has mainly been focusing on getting the element type right and then do a copy_oftype(A, PromoteType). To make the dense factorizations work for AbstractMatrix, we could do something like

AC = similar(A, PromoteType, returnsize)
copy!(AC, A)

or define a convenience function for that.

@yuyichao
Copy link
Contributor

yuyichao commented May 8, 2016

@timholy You should probably just copy the code and run it separately, there are many failures in the doc test now. I noticed this while rebasing #15753 It also seems that python2 is complaining about the unicode in some of the tests (that's causing the python error you get). I've also seen the hang in metaprogramming doc but haven't got time to look at it more carefully.

For me the interface test is the first (or the second) one though....

@timholy
Copy link
Member Author

timholy commented May 8, 2016

I think that's needed only for qrfact!, but that would also be a misnomer because it wouldn't be in-place. qrfact itself always makes a copy of A. Check out #16262.

@timholy
Copy link
Member Author

timholy commented May 8, 2016

You should probably just copy the code and run it separately, there are many failures in the doc test now.

Given the number of other things on my plate right now, I'm just gonna wait until someone fixes the rest of the problems first 😉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.