Use A'A instead of [0 A;A' 0] in svds. #21701

andreasnoack · 2017-05-04T19:39:58Z

The former can be much faster. The latter is only relevant when computing the small values and we don't support that. I've kept the SVDOperator but renamed it SVDAugmetned since that name was used in PRIMME.

This also fixes JuliaLang/LinearAlgebra.jl#335 and this PR supersedes #20694.

Update: To quantify this, the time to process IMDB on my laptop with this is

julia> @time svds(A, nsv = 10);
  7.760984 seconds (833 allocations: 394.811 MiB, 2.73% gc time)

where it used to be

julia> @time svds(A, nsv = 10);
 45.004135 seconds (2.31 k allocations: 891.884 MB, 1.31% gc time)

tkelman · 2017-05-04T19:43:52Z

test/linalg/arnoldi.jl

@@ -1,80 +1,85 @@
-# This file is a part of Julia. License is MIT: https://julialang.org/license
+# This file is a part of Julia. License is MIT: http://julialang.org/license


wrong conflict resolution

… The latter is only relevant when computing the small values and we don't support that. This also fixes #16608.

tkelman · 2017-05-04T19:51:47Z

Where is most of the time difference spent? Doesn't this make any ill-conditioning worse?

andreasnoack · 2017-05-04T19:56:22Z

The Lanczos vectors are double as long in the square case so all operations related to managing the subspace are more expensive. It makes the conditioning worse but that doesn't really matter for the relative accuracy of the largest singular values. It is only a concern if you try to compute the smallest values and we don't support that.

tkelman · 2017-05-04T20:12:35Z

Can you post profiles? That's a factor of 6, so something else is going on.

andreasnoack · 2017-05-05T02:43:18Z

You are right that the work on the Lanczos vectors can't justify the difference. I just looked at the other output from the computations and it appears that the "augmented" version simply requires more iterations before it converges. I also checked the profile output and it doesn't look like anything is going wrong there. E.g. there is no dispatch to slow fallbacks.

My theory is the following: The augmented model duplicates all the singular values with flipped sign. Right now, we run in :LM mode so we need twice the number of converged values compared to the AtA method. #20694 uses :LR mode which is more efficient since it only requires the largest values to converge. However, even though the implicit restart filters out the large negative components from the subspace, their magnitude will probably cause them to reenter again and again. Finally, I'm wondering if the larger spectral gap caused by the "squaring" will give faster convergence rate. Some of the readers of this thread will probably be able to clarify this.

andreasnoack · 2017-05-05T14:13:27Z

Just compared that augmented version to the A'A version in https://github.com/andreasnoack/Primme.jl and I see a similar time difference so I think AtA is just a faster method.

tkelman · 2017-05-05T17:12:45Z

don't many other libraries default to the augmented system though?

andreasnoack · 2017-05-05T17:42:21Z

Scipy uses A'A, Octave uses the augmented version, and Matlab doesn't use ARPACK anymore. The current version was chosen in #9425 but we never added support for the small values and you don't win any accuracy for the large values by using the augmented system.

tkelman · 2017-05-07T19:51:46Z

base/linalg/generic.jl

@@ -800,6 +800,8 @@ condition number, or componentwise relative condition number.
 """
 condskeel(A::AbstractMatrix, x::AbstractVector, p::Real=Inf) = norm(abs.(inv(A))*(abs.(A)*abs.(x)), p)

+issymmetric(A::AbstractMatrix{<:Real}) = ishermitian(A)


this seems backwards as a fallback

I think this is the right way to define it since it is more likely that you'd define something Hermitian (or self adjoint). E.g. AtA_or_AAt and SVDAugmented are self adjoint by construction so we define ishermitian(A::AtA_orAAt)=true. If we remove the (correct) issymmetric fallback then it would be necessary to define issymmetric each of such operators.

If the element type is real, I think defining issymmetric would be more common for user defined array types

tkelman · 2017-05-07T19:54:09Z

There isn't any great reason we haven't implemented smallest value support, and wouldn't this need to be reverted if we do?

andreasnoack · 2017-05-07T20:03:39Z

It harder to find the smallest values. If we try to get that working, we should use the augmented version for small values and A'A for large values. I kept the SVDAugmented type around for this reason.

StefanKarpinski · 2017-05-08T13:33:21Z

In general, we should tend towards defaults that work and are highly reliable at the cost of some performance, and provide options for those who want to use faster, less reliable methods that happen to work for their cases. That seems to only apply partially here, but I mention it because there seems to be a tendency in the linalg algorithm design community to favor faster, cleverer methods that have more brittle edge-cases over the dumber, slower, more reliable methods. I have found that typical day-to-day users of linalg routines would generally much rather have the reliable versions even if they're up to 2x slower, as long as they work without further fuss.

andreasnoack · 2017-05-08T13:43:06Z

Apparently, I haven't been able to get my point across so I'll try again: There is no downside to using A'A for the large values which is the only mode we support. The A'A also seems to handle higher multiplicities better which was the problem in JuliaLang/LinearAlgebra.jl#335.

StefanKarpinski · 2017-05-08T13:53:34Z

That was the impression I got, but I thought I'd put it out there just as a general reminder 😺

andreasnoack · 2017-05-08T14:18:10Z

I agree with the general policy.

Jutho · 2018-05-08T13:55:30Z

I would think that with the augmented version, small singular values are nearly impossible (without inverting the matrix), as they are in the middle of the spectrum, i.e. close to zero whereas the spectrum is symmetric around zero.

With the squaring, the spectrum is strictly positive, and small and large singular values appear at both ends of the spectrum, and should both be reachable by a direct (i.e. no inversion) Krylov method, just using :LR and :SR. This, however, modulo the fact that you loose precision by taking the square root of small values.

Finally, I'm wondering if the larger spectral gap caused by the "squaring" will give faster convergence rate. Some of the readers of this thread will probably be able to clarify this.

I would think that larger spectral gap caused by the "squaring" will be disadvantageous for convergence, i.e. the identity matrix is the one which converges most quickly to an invariant subspace. But I am no expert on Krylov methods.

Jutho · 2018-05-08T13:56:57Z

A different question (I can open a new Issue for this) that came up by a user of my LinearMaps package. Is it really necessary to require A<:AbstractArray for svds. This is not required for eigs itself, as long as it behaves correctly (i.e. multiplication with vector etc). Put differently, it should be possible to compute singular values of LinearMaps objects which have a function for both A*x and A'*x defined, no?

stevengj · 2018-05-08T17:58:41Z

The problem, as I understand it (see e.g. Trefethen & Bau chapter 31) is that for any singular value ≪ ‖A‖ = max σ, the A'*A method loses a factor of cond(A) in accuracy because of the squared condition number. In particular, the relative error is O(ε‖A‖²/σ²) instead of O(ε‖A‖/σ) for the augmented method.

So I'm concerned that the new method will be much less accurate than the old one for matrices where the first singular value is much larger than the subsequent ones, or in general when the nsv singular values computed by svds includes singular values ≪ ‖A‖.

@andreasnoack, have you checked a case like that?

andreasnoack · 2018-06-05T11:55:14Z

Is it really necessary to require A<:AbstractArray for svds.

I don't think being a subtype of AbstractMatrix is an issue. You'd have to communicate the size an element type of the problem somehow and AbstractMatrix will do that for you. Alternatively, it can be communicated through the initial vertor and * but then you won't have a way to pick a default initial vector.

@stevengj Currently we only support finding the larger singular values so the loss of relative accuracy for small singular values shouldn't be a concern. If we begin to support finding the smaller values then it makes sense to reintroduce the other operator alongside the fast operator.

stevengj · 2018-06-05T12:44:17Z

Currently we only support finding the larger singular values so the loss of relative accuracy for small singular values shouldn't be a concern.

@andreasnoack, the problem arises even for the largest singular values, if the largest values include some values much larger than the others. For example, suppose that you have a matrix where the largest singular value is 1 and the second-largest singular value is 1e-6. My understanding is that, with the A'A method, the second (1e-6) singular value will lose an additional 6 significant digits compared to the augmented method.

Jutho · 2018-06-05T13:22:45Z

I don't think being a subtype of AbstractMatrix is an issue. You'd have to communicate the size an element type of the problem somehow and AbstractMatrix will do that for you. Alternatively, it can be communicated through the initial vertor and * but then you won't have a way to pick a default initial vector.

But being a subtype of AbstractMatrix is not required for eigs, only for svds because of the wrapper. LinearMap objects from my LinearMaps.jl package do communicate size, yet LinearMap is not a subtype of AbstractMatrix. If I interpret the Julia manual correctly, the contract for something being an AbstractArray is to have at least an efficient getindex. It is in LinearMaps.jl that the issue of svds not being supported was reported (JuliaLinearAlgebra/LinearMaps.jl#20).

andreasnoack · 2018-06-05T13:44:57Z

I do remember the old discussion about AbstractMatrix and fast getindex and I'm not sure we reached the right conclusion. I think AbstractMatrix is the right supertype to use for here and it doesn't restrict you in what you can do since you can overload * to do whatever you'd like without storing the elements. As an example of the issue with avoiding AbstractMatrix, we have that Adjoint <: AbstractMatrix.

andreasnoack · 2018-06-05T13:50:40Z

@stevengj Yes, small singular values lose their relative accuracy but I don't think this is a big concern in practice. Maybe this matters in physics problems but it wouldn't in data problems. There you'd be fine with getting the large values with high accuracy. I should mention that I'm not against supporting the non-squaring operator but, as we have seen, it will require some work to get right and I don't think it is a high priority.

Jutho · 2018-06-05T13:58:46Z

I think AbstractMatrix is the right supertype to use for here

This goes against everything I learned when I first started using Julia 0.3 years ago, when I also assumed that AbstractMatrix was the de facto supertype for linear operators, but learned from people like @stevengj that there is duck typing and AbstractArray is in essence the top of a type hierarchy for container like structures. Why else is there no restriction A::AbstractMatrix in eigs or in all of IterativeSolvers.jl?

But maybe I should open a new issue since I don't want to take the current issue off-topic.

andreasnoack · 2018-06-05T14:09:55Z

Sometimes ideas end up not being as good as they seemed. Having a fast getindex as a requirement for AbstractArray sounded right but in practice, we need AbstractMatrix for a lot of linear algebra operations or at least is seem unreasonable complicated to avoid AbstractMatrix. The fast getindex requirement is also problematic in other cases such as distributed arrays and compact matrix representations such as the ones we use for the QR factorization. I think it is better to allow all of these to be AbstractArrays.

stevengj · 2018-06-05T15:46:22Z

I don't see the point in restricting ourselves to AbstractMatrix here. Whether or not you think that this should be the supertype used for all linear-operator-like objects, the restriction doesn't actually buy us anything for a function like svds — we aren't dispatching to anything else for a non-AbstractMatrix type.

andreasnoack · 2018-06-06T11:30:26Z

I've looked at the details of this again to refresh my memory and I'm actually only using AbstractMatrix to promote the element type of the input array to ensure that BLAS is called if the input is e.g. an integer matrix. However, this promotion is not uncontroversial as the user might want to avoid a copy of the input array. I'll prepare a PR with this change. (In the current state of affairs, I still don't see why it would be a problem for anybody to make their custom linear map a subtype of AbstractMatrix but that is a discussion for another day.)

StefanKarpinski · 2018-06-06T14:16:25Z

I still don't see why it would be a problem for anybody to make their custom linear map a subtype of AbstractMatrix but that is a discussion for another day.

A matrix ought to at the minimum have reasonably efficient A[i,j] indexing which, as I understand it, some linear operators might not.

andreasnoack · 2018-06-06T14:20:08Z

A matrix ought to at the minimum have reasonably efficient A[i,j] indexing

Then QR Qs and DArrays cannot be matrices. AbstractMatrix is important for much of the *, \, adjoint, and transpose code and it has nothing to do with indexing.

StefanKarpinski · 2018-06-06T14:33:40Z

AbstractMatrix is a kind of AbstractArray and the most basic premise of an array is that it supports indexing to get elements. It certainly does seem that the LinearAlgebra stdlib abuses the type, but I'm not sure how much harm it causes.

andreasnoack · 2018-06-06T14:38:48Z

and the most basic premise of an array is that it supports indexing to get elements

The question is if fast indexing is a requirement for the the most basic premise of an array.

StefanKarpinski · 2018-06-06T14:43:00Z

Fast is relative. Some kind of indexing seems like it is a requirements. IIRc, several of these types don't support indexing at all.

andreasnoack · 2018-06-06T14:47:21Z

I think you'd always be able to do what we do for the QR Qs, i.e. Q[i,j] = dot(e_i, Q*e_j).

StefanKarpinski · 2018-06-06T14:48:49Z

Sure, if that's possible then it seems fine to consider these 2-d arrays.

tkelman reviewed May 4, 2017

View reviewed changes

Use A'A instead of [0 A;A' 0] in svds. The former can be much faster.…

6ebeccc

… The latter is only relevant when computing the small values and we don't support that. This also fixes #16608.

andreasnoack force-pushed the anj/svdsfix2 branch from 14f262b to 6ebeccc Compare May 4, 2017 19:47

kshyatt added the linear algebra Linear algebra label May 5, 2017

andreasnoack merged commit a6f1e96 into master May 7, 2017

andreasnoack deleted the anj/svdsfix2 branch May 7, 2017 19:00

andreasnoack mentioned this pull request May 7, 2017

Be more careful when selecting values and vectors in svds #20694

Closed

tkelman reviewed May 7, 2017

View reviewed changes

andreasnoack mentioned this pull request Jun 7, 2018

Loosen AbstractMatrix requirement in svds #27480

Closed

dkarrasch mentioned this pull request Jul 10, 2018

SVDS of a LinearMap JuliaLinearAlgebra/Arpack.jl#13

Closed

andreasnoack mentioned this pull request Oct 26, 2018

Singular vectors occasionally sign flipped in svds? JuliaLinearAlgebra/Arpack.jl#41

Closed

andreasnoack mentioned this pull request May 7, 2018

svds gives wrong results for degenerate singular values JuliaLang/LinearAlgebra.jl#520

Closed

		@@ -1,80 +1,85 @@
		# This file is a part of Julia. License is MIT: https://julialang.org/license
		# This file is a part of Julia. License is MIT: http://julialang.org/license

Use A'A instead of [0 A;A' 0] in svds. #21701

Use A'A instead of [0 A;A' 0] in svds. #21701

Conversation

andreasnoack commented May 4, 2017 • edited Loading

tkelman May 4, 2017

Choose a reason for hiding this comment

andreasnoack May 4, 2017

Choose a reason for hiding this comment

tkelman commented May 4, 2017

andreasnoack commented May 4, 2017 • edited Loading

tkelman commented May 4, 2017

andreasnoack commented May 5, 2017

andreasnoack commented May 5, 2017

tkelman commented May 5, 2017

andreasnoack commented May 5, 2017

tkelman May 7, 2017

Choose a reason for hiding this comment

andreasnoack May 7, 2017

Choose a reason for hiding this comment

tkelman May 7, 2017

Choose a reason for hiding this comment

tkelman commented May 7, 2017

andreasnoack commented May 7, 2017

StefanKarpinski commented May 8, 2017

andreasnoack commented May 8, 2017

StefanKarpinski commented May 8, 2017 • edited Loading

andreasnoack commented May 8, 2017

Jutho commented May 8, 2018

Jutho commented May 8, 2018

stevengj commented May 8, 2018

andreasnoack commented Jun 5, 2018

stevengj commented Jun 5, 2018 • edited Loading

Jutho commented Jun 5, 2018

andreasnoack commented Jun 5, 2018

andreasnoack commented Jun 5, 2018

Jutho commented Jun 5, 2018

andreasnoack commented Jun 5, 2018

stevengj commented Jun 5, 2018

andreasnoack commented Jun 6, 2018

StefanKarpinski commented Jun 6, 2018

andreasnoack commented Jun 6, 2018

StefanKarpinski commented Jun 6, 2018

andreasnoack commented Jun 6, 2018

StefanKarpinski commented Jun 6, 2018

andreasnoack commented Jun 6, 2018

StefanKarpinski commented Jun 6, 2018

andreasnoack commented May 4, 2017 •

edited

Loading

andreasnoack commented May 4, 2017 •

edited

Loading

StefanKarpinski commented May 8, 2017 •

edited

Loading

stevengj commented Jun 5, 2018 •

edited

Loading