Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use A'A instead of [0 A;A' 0] in svds. #21701

Merged
merged 1 commit into from
May 7, 2017
Merged

Use A'A instead of [0 A;A' 0] in svds. #21701

merged 1 commit into from
May 7, 2017

Conversation

andreasnoack
Copy link
Member

@andreasnoack andreasnoack commented May 4, 2017

The former can be much faster. The latter is only relevant when computing the small values and we don't support that. I've kept the SVDOperator but renamed it SVDAugmetned since that name was used in PRIMME.

This also fixes JuliaLang/LinearAlgebra.jl#335 and this PR supersedes #20694.

Update: To quantify this, the time to process IMDB on my laptop with this is

julia> @time svds(A, nsv = 10);
  7.760984 seconds (833 allocations: 394.811 MiB, 2.73% gc time)

where it used to be

julia> @time svds(A, nsv = 10);
 45.004135 seconds (2.31 k allocations: 891.884 MB, 1.31% gc time)

@@ -1,80 +1,85 @@
# This file is a part of Julia. License is MIT: https://julialang.org/license
# This file is a part of Julia. License is MIT: http://julialang.org/license
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong conflict resolution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

… The latter

is only relevant when computing the small values and we don't support that.

This also fixes #16608.
@tkelman
Copy link
Contributor

tkelman commented May 4, 2017

Where is most of the time difference spent? Doesn't this make any ill-conditioning worse?

@andreasnoack
Copy link
Member Author

andreasnoack commented May 4, 2017

The Lanczos vectors are double as long in the square case so all operations related to managing the subspace are more expensive. It makes the conditioning worse but that doesn't really matter for the relative accuracy of the largest singular values. It is only a concern if you try to compute the smallest values and we don't support that.

@tkelman
Copy link
Contributor

tkelman commented May 4, 2017

Can you post profiles? That's a factor of 6, so something else is going on.

@andreasnoack
Copy link
Member Author

You are right that the work on the Lanczos vectors can't justify the difference. I just looked at the other output from the computations and it appears that the "augmented" version simply requires more iterations before it converges. I also checked the profile output and it doesn't look like anything is going wrong there. E.g. there is no dispatch to slow fallbacks.

My theory is the following: The augmented model duplicates all the singular values with flipped sign. Right now, we run in :LM mode so we need twice the number of converged values compared to the AtA method. #20694 uses :LR mode which is more efficient since it only requires the largest values to converge. However, even though the implicit restart filters out the large negative components from the subspace, their magnitude will probably cause them to reenter again and again. Finally, I'm wondering if the larger spectral gap caused by the "squaring" will give faster convergence rate. Some of the readers of this thread will probably be able to clarify this.

@kshyatt kshyatt added the linear algebra Linear algebra label May 5, 2017
@andreasnoack
Copy link
Member Author

Just compared that augmented version to the A'A version in https://github.com/andreasnoack/Primme.jl and I see a similar time difference so I think AtA is just a faster method.

@tkelman
Copy link
Contributor

tkelman commented May 5, 2017

don't many other libraries default to the augmented system though?

@andreasnoack
Copy link
Member Author

Scipy uses A'A, Octave uses the augmented version, and Matlab doesn't use ARPACK anymore. The current version was chosen in #9425 but we never added support for the small values and you don't win any accuracy for the large values by using the augmented system.

@@ -800,6 +800,8 @@ condition number, or componentwise relative condition number.
"""
condskeel(A::AbstractMatrix, x::AbstractVector, p::Real=Inf) = norm(abs.(inv(A))*(abs.(A)*abs.(x)), p)

issymmetric(A::AbstractMatrix{<:Real}) = ishermitian(A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems backwards as a fallback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right way to define it since it is more likely that you'd define something Hermitian (or self adjoint). E.g. AtA_or_AAt and SVDAugmented are self adjoint by construction so we define ishermitian(A::AtA_orAAt)=true. If we remove the (correct) issymmetric fallback then it would be necessary to define issymmetric each of such operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the element type is real, I think defining issymmetric would be more common for user defined array types

@tkelman
Copy link
Contributor

tkelman commented May 7, 2017

There isn't any great reason we haven't implemented smallest value support, and wouldn't this need to be reverted if we do?

@andreasnoack
Copy link
Member Author

It harder to find the smallest values. If we try to get that working, we should use the augmented version for small values and A'A for large values. I kept the SVDAugmented type around for this reason.

@StefanKarpinski
Copy link
Member

In general, we should tend towards defaults that work and are highly reliable at the cost of some performance, and provide options for those who want to use faster, less reliable methods that happen to work for their cases. That seems to only apply partially here, but I mention it because there seems to be a tendency in the linalg algorithm design community to favor faster, cleverer methods that have more brittle edge-cases over the dumber, slower, more reliable methods. I have found that typical day-to-day users of linalg routines would generally much rather have the reliable versions even if they're up to 2x slower, as long as they work without further fuss.

@andreasnoack
Copy link
Member Author

Apparently, I haven't been able to get my point across so I'll try again: There is no downside to using A'A for the large values which is the only mode we support. The A'A also seems to handle higher multiplicities better which was the problem in JuliaLang/LinearAlgebra.jl#335.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 8, 2017

That was the impression I got, but I thought I'd put it out there just as a general reminder 😺

@andreasnoack
Copy link
Member Author

I agree with the general policy.

@Jutho
Copy link
Contributor

Jutho commented May 8, 2018

I would think that with the augmented version, small singular values are nearly impossible (without inverting the matrix), as they are in the middle of the spectrum, i.e. close to zero whereas the spectrum is symmetric around zero.

With the squaring, the spectrum is strictly positive, and small and large singular values appear at both ends of the spectrum, and should both be reachable by a direct (i.e. no inversion) Krylov method, just using :LR and :SR. This, however, modulo the fact that you loose precision by taking the square root of small values.

Finally, I'm wondering if the larger spectral gap caused by the "squaring" will give faster convergence rate. Some of the readers of this thread will probably be able to clarify this.

I would think that larger spectral gap caused by the "squaring" will be disadvantageous for convergence, i.e. the identity matrix is the one which converges most quickly to an invariant subspace. But I am no expert on Krylov methods.

@Jutho
Copy link
Contributor

Jutho commented May 8, 2018

A different question (I can open a new Issue for this) that came up by a user of my LinearMaps package. Is it really necessary to require A<:AbstractArray for svds. This is not required for eigs itself, as long as it behaves correctly (i.e. multiplication with vector etc). Put differently, it should be possible to compute singular values of LinearMaps objects which have a function for both A*x and A'*x defined, no?

@stevengj
Copy link
Member

stevengj commented May 8, 2018

The problem, as I understand it (see e.g. Trefethen & Bau chapter 31) is that for any singular value ≪ ‖A‖ = max σ, the A'*A method loses a factor of cond(A) in accuracy because of the squared condition number. In particular, the relative error is O(ε‖A‖²/σ²) instead of O(ε‖A‖/σ) for the augmented method.

So I'm concerned that the new method will be much less accurate than the old one for matrices where the first singular value is much larger than the subsequent ones, or in general when the nsv singular values computed by svds includes singular values ≪ ‖A‖.

@andreasnoack, have you checked a case like that?

@andreasnoack
Copy link
Member Author

Is it really necessary to require A<:AbstractArray for svds.

I don't think being a subtype of AbstractMatrix is an issue. You'd have to communicate the size an element type of the problem somehow and AbstractMatrix will do that for you. Alternatively, it can be communicated through the initial vertor and * but then you won't have a way to pick a default initial vector.

@stevengj Currently we only support finding the larger singular values so the loss of relative accuracy for small singular values shouldn't be a concern. If we begin to support finding the smaller values then it makes sense to reintroduce the other operator alongside the fast operator.

@stevengj
Copy link
Member

stevengj commented Jun 5, 2018

Currently we only support finding the larger singular values so the loss of relative accuracy for small singular values shouldn't be a concern.

@andreasnoack, the problem arises even for the largest singular values, if the largest values include some values much larger than the others. For example, suppose that you have a matrix where the largest singular value is 1 and the second-largest singular value is 1e-6. My understanding is that, with the A'A method, the second (1e-6) singular value will lose an additional 6 significant digits compared to the augmented method.

@Jutho
Copy link
Contributor

Jutho commented Jun 5, 2018

I don't think being a subtype of AbstractMatrix is an issue. You'd have to communicate the size an element type of the problem somehow and AbstractMatrix will do that for you. Alternatively, it can be communicated through the initial vertor and * but then you won't have a way to pick a default initial vector.

But being a subtype of AbstractMatrix is not required for eigs, only for svds because of the wrapper. LinearMap objects from my LinearMaps.jl package do communicate size, yet LinearMap is not a subtype of AbstractMatrix. If I interpret the Julia manual correctly, the contract for something being an AbstractArray is to have at least an efficient getindex. It is in LinearMaps.jl that the issue of svds not being supported was reported (JuliaLinearAlgebra/LinearMaps.jl#20).

@andreasnoack
Copy link
Member Author

I do remember the old discussion about AbstractMatrix and fast getindex and I'm not sure we reached the right conclusion. I think AbstractMatrix is the right supertype to use for here and it doesn't restrict you in what you can do since you can overload * to do whatever you'd like without storing the elements. As an example of the issue with avoiding AbstractMatrix, we have that Adjoint <: AbstractMatrix.

@andreasnoack
Copy link
Member Author

@stevengj Yes, small singular values lose their relative accuracy but I don't think this is a big concern in practice. Maybe this matters in physics problems but it wouldn't in data problems. There you'd be fine with getting the large values with high accuracy. I should mention that I'm not against supporting the non-squaring operator but, as we have seen, it will require some work to get right and I don't think it is a high priority.

@Jutho
Copy link
Contributor

Jutho commented Jun 5, 2018

I think AbstractMatrix is the right supertype to use for here

This goes against everything I learned when I first started using Julia 0.3 years ago, when I also assumed that AbstractMatrix was the de facto supertype for linear operators, but learned from people like @stevengj that there is duck typing and AbstractArray is in essence the top of a type hierarchy for container like structures. Why else is there no restriction A::AbstractMatrix in eigs or in all of IterativeSolvers.jl?

But maybe I should open a new issue since I don't want to take the current issue off-topic.

@andreasnoack
Copy link
Member Author

Sometimes ideas end up not being as good as they seemed. Having a fast getindex as a requirement for AbstractArray sounded right but in practice, we need AbstractMatrix for a lot of linear algebra operations or at least is seem unreasonable complicated to avoid AbstractMatrix. The fast getindex requirement is also problematic in other cases such as distributed arrays and compact matrix representations such as the ones we use for the QR factorization. I think it is better to allow all of these to be AbstractArrays.

@stevengj
Copy link
Member

stevengj commented Jun 5, 2018

I don't see the point in restricting ourselves to AbstractMatrix here. Whether or not you think that this should be the supertype used for all linear-operator-like objects, the restriction doesn't actually buy us anything for a function like svds — we aren't dispatching to anything else for a non-AbstractMatrix type.

@andreasnoack
Copy link
Member Author

I've looked at the details of this again to refresh my memory and I'm actually only using AbstractMatrix to promote the element type of the input array to ensure that BLAS is called if the input is e.g. an integer matrix. However, this promotion is not uncontroversial as the user might want to avoid a copy of the input array. I'll prepare a PR with this change. (In the current state of affairs, I still don't see why it would be a problem for anybody to make their custom linear map a subtype of AbstractMatrix but that is a discussion for another day.)

@StefanKarpinski
Copy link
Member

I still don't see why it would be a problem for anybody to make their custom linear map a subtype of AbstractMatrix but that is a discussion for another day.

A matrix ought to at the minimum have reasonably efficient A[i,j] indexing which, as I understand it, some linear operators might not.

@andreasnoack
Copy link
Member Author

A matrix ought to at the minimum have reasonably efficient A[i,j] indexing

Then QR Qs and DArrays cannot be matrices. AbstractMatrix is important for much of the *, \, adjoint, and transpose code and it has nothing to do with indexing.

@StefanKarpinski
Copy link
Member

AbstractMatrix is a kind of AbstractArray and the most basic premise of an array is that it supports indexing to get elements. It certainly does seem that the LinearAlgebra stdlib abuses the type, but I'm not sure how much harm it causes.

@andreasnoack
Copy link
Member Author

and the most basic premise of an array is that it supports indexing to get elements

The question is if fast indexing is a requirement for the the most basic premise of an array.

@StefanKarpinski
Copy link
Member

Fast is relative. Some kind of indexing seems like it is a requirements. IIRc, several of these types don't support indexing at all.

@andreasnoack
Copy link
Member Author

I think you'd always be able to do what we do for the QR Qs, i.e. Q[i,j] = dot(e_i, Q*e_j).

@StefanKarpinski
Copy link
Member

Sure, if that's possible then it seems fine to consider these 2-d arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

svds returns non-orthogonal singular vectors when there are degenerate singular values
6 participants