Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use lufact in inv of Symmetric/Hermitian matrices #22686

Merged
merged 1 commit into from
Jul 18, 2017

Conversation

fredrikekre
Copy link
Member

@fredrikekre fredrikekre commented Jul 5, 2017

Seems to be generally faster; with the following

using BenchmarkTools
luf(A) = Symmetric(inv(lufact(A)))
bkf(A) = Symmetric(inv(bkfact(A)))
for N in (10, 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000)
    A = rand(N, N); S = Symmetric(A);
    print("inv bkfact $N:"); @btime bkf($S)
    print("inv lufact $N:"); @btime luf($S)
end

I get:

Matrix size inv(bkfact(A)) inv(lufact(A)) ratio
10 5.467 μs (28% in bkfact) 2.219 μs (41% in lufact) 0.406
100 358.552 μs (25% in bkfact) 317.677 μs (43% in lufact) 0.886
500 8.246 ms (47% in bkfact) 8.669 ms (41% in lufact) 1.051
1000 48.386 ms (38% in bkfact) 52.142 ms (36% in lufact) 1.078
1500 200.098 ms (24% in bkfact) 156.982 ms (33% in lufact) 0.785
2000 502.729 ms (20% in bkfact) 370.605 ms (32% in lufact) 0.737
2500 1.016 s (18% in bkfact) 648.959 ms (32% in lufact) 0.639
3000 1.795 s (15% in bkfact) 1.165 s (31% in lufact) 0.649
3500 2.869 s (15% in bkfact) 1.795 s (23% in lufact) 0.626
4000 6.517 s (13% in bkfact) 2.818 s (29% in lufact) 0.432

Allocations are the same for both methods.

Would be nice if someone could confirm this on some other computer.

@tkelman
Copy link
Contributor

tkelman commented Jul 5, 2017

how much is spent in factorize vs backsolve with this?

@fredrikekre
Copy link
Member Author

Updated the table with percentage in factorization. It is definitely the backsolve that is slower for the Bunch Kaufman approach. Probably related to #22561 (comment)?

@tkelman
Copy link
Contributor

tkelman commented Jul 5, 2017

I also wonder if there might be any bias here in deviations from symmetry when lufact is used, especially for ill-conditioned inputs. Maybe not enough to worry about since you shouldn't usually use inv anyway.

@fredrikekre
Copy link
Member Author

fredrikekre commented Jul 5, 2017

Updated to use the inplace inv! for LU, this cuts the allocations in half as well. (for the first commit, the allocations are the same for the two approaches)

@fredrikekre
Copy link
Member Author

Tested on my computer at home, which has better prestanda, here are the ratios:

Matrix size ratio
10 0.423
100 0.798
500 0.895
1000 1.140
1500 0.549
2000 0.385
2500 0.286
3000 0.254
3500 0.241
4000 0.236

@ararslan ararslan added the linear algebra Linear algebra label Jul 5, 2017
Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! :)

@StefanKarpinski
Copy link
Member

Shall we merge this then? The performance improvements look consistent and substantial.

@fredrikekre
Copy link
Member Author

Yes, please! 😄

@andreasnoack
Copy link
Member

andreasnoack commented Jul 11, 2017

There is a funny little story about numerical analysis hidden here. I happen to have spent some time on this because of a small post to NA Digest in 2015. Suddenly Matlab's inv gave more inaccurate results than previous versions. After many experiments, I realized what was going on. If you use the LU to compute the inverse of an exactly symmetric matrix then the inverse is no longer symmetric.

julia> H = [1/(i + j - 1) for i in 1:8, j in 1:8];

julia> issymmetric(H)
true

julia> issymmetric(inv(lufact(H)))
false

This was the case in older versions of Matlab. It is pretty annoying that the inverse of a symmetric matrix is not symmetric and that was probably what made them change the behavior around R2013a. Suddenly, the matrix was symmetric. The way they did it was to use LU and then copy the elements from one triangle to another. That would effectively be the same as this PR. The funny thing is that

julia> norm(inv(H)*(H*ones(8)) - 1)
8.704438947487126e-7

is fine but

julia> norm(Symmetric(inv(lufact(H)))*(H*ones(8)) - 1)
0.2623294451513258

has a huge error. So the rounding errors in the LU based inverse somehow cancel accross the lower and upper triangle when solving. Also, this doesn't happen for Bunch-Kaufman that only works on a single triangle

julia> norm(inv(bkfact(H))*(H*ones(8)) - 1)
7.277205904268713e-7

Matlab changed the behavior again in R2016a but without mentioning anything in their release notes. A weird thing is that from and after 2016b I get different behavior on my Mac and on a Linux system. The Linux version is symmetric but the Mac inverse isn't. They both have small error now.

Bottomline is that we shouldn't merge this as it is.

@andreasnoack
Copy link
Member

andreasnoack commented Jul 11, 2017

We could continue to use Bunch-Kaufman but since LU is so much faster it is probably better to figure out if symmetrizing with the average (A+A')/2 (in place) is still faster. I would think so. The averaged version doesn't have the flaw

julia> inv(H) |> t -> Symmetric(t + t')/2*(H*ones(8)) - 1 |> norm
1.5769221751231655e-6

@fredrikekre
Copy link
Member Author

Oh, thanks for catching that. This only applies to inv though, right? Things like lufact(A)\b when A is symmetric is okay I hope?

@andreasnoack
Copy link
Member

Yes. That is fine. The problem is only when copying elements from one triangle to the other in the unsymmetric LU based inverse. We might actually want to add the Hilbert matrix example as a test case to make sure we don't accidentally make the mistake in the future.

@fredrikekre
Copy link
Member Author

Yes. That is fine. The problem is only when copying elements from one triangle to the other in the unsymmetric LU based inverse.

Okay.

We might actually want to add the Hilbert matrix example as a test case to make sure we don't accidentally make the mistake in the future.

Yea, planned to do that. For the in-place symmetrization, there is no such thing function implemented, right?

@andreasnoack
Copy link
Member

Yea, planned to do that. For the in-place symmetrization, there is no such thing function implemented, right?

Not to my knowledge

@StefanKarpinski
Copy link
Member

As an aside, I would like to point out how lovely it is that you can so easily test out the behaviors of different factorization approaches with code as simple and clear as this:

norm(inv(H)*(H*ones(8)) - 1)
norm(Symmetric(inv(lufact(H)))*(H*ones(8)) - 1)
norm(inv(bkfact(H))*(H*ones(8)) - 1)

@fredrikekre
Copy link
Member Author

Okay, I have an updated version now. Benchmarks:

Matrix size master (using bkfact) PR (using lufact) ratio (PR/master)
10 5.236 μs 1.747 μs 0.334
100 203.389 μs 165.728 μs 0.815
500 4.635 ms 4.175 ms 0.901
1000 23.640 ms 27.766 ms 1.175
1500 165.135 ms 89.396 ms 0.541
2000 515.413 ms 214.738 ms 0.417
2500 1.208 s 365.067 ms 0.302
3000 2.312 s 627.485 ms 0.271
3500 3.867 s 998.406 ms 0.258
4000 5.969 s 1.548 s 0.259

and the allocations are still half for this PR.

@fredrikekre fredrikekre force-pushed the fe/inv-lufact branch 2 times, most recently from 86f16d4 to 8376cbb Compare July 11, 2017 23:37
# symmetrize
if A.uplo == 'U' # add to upper triangle
@inbounds for i = 1:(n-1), j = (i+1):n
B[i,j] = conjugate ? (B[i,j] + conj(B[j,i])) / 2 : (B[i,j] + B[j,i]) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a branch here might be expensive so I think it might be worthwhile to split the method. Maybe as an @evaled loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels LLVM should be able to do something like that by itself by noticing that conjugate is constant in the loop. Probably worth to benchmark before changing the code to a more hard to read version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it, I just copied the structure from copytri!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarking (and some amateuristic inspection of generated code) says this is not a problem.

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo the splitting @andreasnoack suggested! :)

inv(A::Symmetric{T,S}) where {T<:BlasFloat,S<:StridedMatrix} = Symmetric{T,S}(inv(bkfact(A)), A.uplo)
function _inv(A::HermOrSym)
n = checksquare(A)
B = inv!(lufact(A))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do all lufact return types always support in-place inv! ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will only be called for StridedMatrix and BlasFloat, so presumably this should be OK?

conjugate = isa(A, Hermitian)
# symmetrize
if A.uplo == 'U' # add to upper triangle
@inbounds for i = 1:(n-1), j = (i+1):n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would inverting the LU factorization ever make a zero imaginary component non-zero? If so, should probably include the diagonal for Hermitian

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well always include the diagonal, its not much more work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would inverting the LU factorization ever make a zero imaginary component non-zero?

Seems like it actually (sometimes):

julia> complex.(rand(2,2), rand(2,2)) |> t -> (t + t') |> t -> Hermitian(inv(lufact(t)))
ERROR: ArgumentError: Cannot construct Hermitian from matrix with nonreal diagonals
Stacktrace:
[...]

Will include the diagonal and add a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

end
B
end
inv(A::Hermitian{T,S}) where {T<:BlasFloat,S<:StridedMatrix} = Hermitian{T,S}(_inv(A), A.uplo)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was here before this PR, but the construction of the Hermitian here elides the check for complex diagonal elements. Should that check be moved to an inner constructor? Or, alternatively, just call Hermitian here without type arguments, and rely on the outer constructor. Currently:

julia> H
2×2 Hermitian{Complex{Float64},Array{Complex{Float64},2}}:
 0.650488-0.0im       0.826686+0.667447im
 0.826686-0.667447im   1.81707-0.0im     

julia> iH = inv(H)
2×2 Hermitian{Complex{Float64},Array{Complex{Float64},2}}:
  34.2282+2.13163e-14im  -15.5723-12.5727im    
 -15.5723+12.5727im       12.2532+7.10543e-15im

julia> isreal(iH[1,1])
false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hasn't the imaginary part of the diagonal been zeroed by the symmetrization? Eventually, the check should just go away completely, see #17367

Copy link
Member Author

@fredrikekre fredrikekre Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example was for the original version, which did not loop to the diagonal, like in copytri!. See #22686 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got the chronology of the messages wrong. Then I guess this line is fine, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@fredrikekre
Copy link
Member Author

AV x86 failed with:

    FAILURE
Error in testset resolve:
Error During Test
  Test threw an exception of type ReadOnlyMemoryError
  Expression: sanity_tst(deps_data, problematic_data)
  ReadOnlyMemoryError()
  Stacktrace:
   [1] IOContext(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Pair{Symbol,Bool}) at .\show.jl:90 (repeats 2 times)
   [2] print(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Base.PipeServer) at .\strings\io.jl:29
   [3] print(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Base.PipeServer) at .\strings\io.jl:31

presumably unrelated?

@fredrikekre
Copy link
Member Author

Noticed in #22827 that this enables inv for Symmetric{BigFloat} as well, since lufact works in that case, but there is no BigFloat implementation for bkfact! 🎉

@andreasnoack andreasnoack merged commit 22ed8b5 into JuliaLang:master Jul 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants