-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use A'A instead of [0 A;A' 0] in svds. #21701
Conversation
test/linalg/arnoldi.jl
Outdated
@@ -1,80 +1,85 @@ | |||
# This file is a part of Julia. License is MIT: https://julialang.org/license | |||
# This file is a part of Julia. License is MIT: http://julialang.org/license |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong conflict resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
… The latter is only relevant when computing the small values and we don't support that. This also fixes #16608.
Where is most of the time difference spent? Doesn't this make any ill-conditioning worse? |
The Lanczos vectors are double as long in the square case so all operations related to managing the subspace are more expensive. It makes the conditioning worse but that doesn't really matter for the relative accuracy of the largest singular values. It is only a concern if you try to compute the smallest values and we don't support that. |
Can you post profiles? That's a factor of 6, so something else is going on. |
You are right that the work on the Lanczos vectors can't justify the difference. I just looked at the other output from the computations and it appears that the "augmented" version simply requires more iterations before it converges. I also checked the profile output and it doesn't look like anything is going wrong there. E.g. there is no dispatch to slow fallbacks. My theory is the following: The augmented model duplicates all the singular values with flipped sign. Right now, we run in |
Just compared that augmented version to the |
don't many other libraries default to the augmented system though? |
Scipy uses |
@@ -800,6 +800,8 @@ condition number, or componentwise relative condition number. | |||
""" | |||
condskeel(A::AbstractMatrix, x::AbstractVector, p::Real=Inf) = norm(abs.(inv(A))*(abs.(A)*abs.(x)), p) | |||
|
|||
issymmetric(A::AbstractMatrix{<:Real}) = ishermitian(A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems backwards as a fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right way to define it since it is more likely that you'd define something Hermitian (or self adjoint). E.g. AtA_or_AAt
and SVDAugmented
are self adjoint by construction so we define ishermitian(A::AtA_orAAt)=true
. If we remove the (correct) issymmetric
fallback then it would be necessary to define issymmetric
each of such operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the element type is real, I think defining issymmetric would be more common for user defined array types
There isn't any great reason we haven't implemented smallest value support, and wouldn't this need to be reverted if we do? |
It harder to find the smallest values. If we try to get that working, we should use the augmented version for small values and |
In general, we should tend towards defaults that work and are highly reliable at the cost of some performance, and provide options for those who want to use faster, less reliable methods that happen to work for their cases. That seems to only apply partially here, but I mention it because there seems to be a tendency in the linalg algorithm design community to favor faster, cleverer methods that have more brittle edge-cases over the dumber, slower, more reliable methods. I have found that typical day-to-day users of linalg routines would generally much rather have the reliable versions even if they're up to 2x slower, as long as they work without further fuss. |
Apparently, I haven't been able to get my point across so I'll try again: There is no downside to using |
That was the impression I got, but I thought I'd put it out there just as a general reminder 😺 |
I agree with the general policy. |
I would think that with the augmented version, small singular values are nearly impossible (without inverting the matrix), as they are in the middle of the spectrum, i.e. close to zero whereas the spectrum is symmetric around zero. With the squaring, the spectrum is strictly positive, and small and large singular values appear at both ends of the spectrum, and should both be reachable by a direct (i.e. no inversion) Krylov method, just using
I would think that larger spectral gap caused by the "squaring" will be disadvantageous for convergence, i.e. the identity matrix is the one which converges most quickly to an invariant subspace. But I am no expert on Krylov methods. |
A different question (I can open a new Issue for this) that came up by a user of my |
The problem, as I understand it (see e.g. Trefethen & Bau chapter 31) is that for any singular value ≪ ‖A‖ = max σ, the So I'm concerned that the new method will be much less accurate than the old one for matrices where the first singular value is much larger than the subsequent ones, or in general when the @andreasnoack, have you checked a case like that? |
I don't think being a subtype of @stevengj Currently we only support finding the larger singular values so the loss of relative accuracy for small singular values shouldn't be a concern. If we begin to support finding the smaller values then it makes sense to reintroduce the other operator alongside the fast operator. |
@andreasnoack, the problem arises even for the largest singular values, if the largest values include some values much larger than the others. For example, suppose that you have a matrix where the largest singular value is |
But being a subtype of |
I do remember the old discussion about |
@stevengj Yes, small singular values lose their relative accuracy but I don't think this is a big concern in practice. Maybe this matters in physics problems but it wouldn't in data problems. There you'd be fine with getting the large values with high accuracy. I should mention that I'm not against supporting the non-squaring operator but, as we have seen, it will require some work to get right and I don't think it is a high priority. |
This goes against everything I learned when I first started using Julia 0.3 years ago, when I also assumed that But maybe I should open a new issue since I don't want to take the current issue off-topic. |
Sometimes ideas end up not being as good as they seemed. Having a fast |
I don't see the point in restricting ourselves to |
I've looked at the details of this again to refresh my memory and I'm actually only using |
A matrix ought to at the minimum have reasonably efficient |
Then QR |
|
The question is if fast indexing is a requirement for the the most basic premise of an array. |
Fast is relative. Some kind of indexing seems like it is a requirements. IIRc, several of these types don't support indexing at all. |
I think you'd always be able to do what we do for the QR |
Sure, if that's possible then it seems fine to consider these 2-d arrays. |
The former can be much faster. The latter is only relevant when computing the small values and we don't support that. I've kept the
SVDOperator
but renamed itSVDAugmetned
since that name was used in PRIMME.This also fixes JuliaLang/LinearAlgebra.jl#335 and this PR supersedes #20694.
Update: To quantify this, the time to process IMDB on my laptop with this is
where it used to be