-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding additional feature on Incremental SVD over available SVD_COMPRESSED method. #11527
Comments
@dask/gpu |
I don't necessarily think this is a GPU issue. It sounds like there is a problem with some intermediate step of the SVD taking up more memory than is available. GPUs typically have less memory than the host and so you're likely to run into this limit more quickly. It sounds like what we want here is a more memory efficient implementation of SVD generally. |
I think you are right @jacobtomlinson , this is not a GPU problem, there must be some issue with Dask SVD_compressed method so it's not able to compute SVD of a matrix after a certain limit of single GPU or say GPU clusters. But I think , making development on this thing is very necessary to do to test Distributed Streaming SVD computation on larger dataset which is available on Storage but not fully loaded on GPU. Kindly guide like how it can be resolved , after it gets solved we will have very sharp edge in AI model building with dask gaining popularity. |
The implementation is here. Do you have any interest in making improvements? Lines 748 to 834 in 966bdb5
|
@jacobtomlinson dask/dask/array/linalg.py Thank you for sharing this. I have checked this file before. But it lacks a feature, like suppose we have a matrix of shape AxB which is already very big like it takes a month to compute svd of such matrix and we computed SVD of this matrix but after we have a new matrix of shape (A+C)x(B+D) where we have same data as we had previously in matrix AxB. But as you know that we already have computed SVD for AxB matrix ,I want to use this computed value to find the SVD of this new data (A+C)x(B+D) . That's why it is being called Incremental SVD . So we don't have to compute SVD on entire data, DASK have this thing already implemented but seem like developers haven't included the API for this to be used. Kindly assist us bringing such feature. |
hello, i would like to try contributing a fix for the issue! 🤠
I tried looking around for the relevant parts of the codebase and I could only find the svd and the svd_compressed functions. I wasn't able to find any implementation that could directly be exposed to the API so I think this will still require doing some work for performing the incremental SVD, right? (let me know in case I missed something) And in that case, it should require adding the incremental support for both the compressed as well as the regular SVDs, right? |
Hi, @hendrikmakait and other teams members,
I am working on implementation of DeepLearning Models using SVD specifically SVD_Compressed method available in Dask using dask.array.linalg.svd_compressed especially using CuPy to compute larger SVD matrix on GPUs. But there is a problem I am facing while implementing this. While Computing SVD on larger matrix which are not possible to get loaded fully on GPU memory at a time, we require something called incremental SVD in addition to SVD_Compressed. I know this is possible to be done and not too tough to be implemented if we already have SVD_Compressed running on GPU already. By this what I actually require to do is Distributed Streaming SVD computation on larger dataset which is available on Storage but not fully loaded on GPU.
The text was updated successfully, but these errors were encountered: