-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: optimize.approx_fprime: avoid quadratic memory usage #20645
Conversation
Currently, the code generates all possible h vectors at once. When optimizing N dimensions, this uses a quadratic amount of memory, which is wasteful. Reduce this by computing those vectors one at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the changes in here. If there's one nit I have is that there's a several copies.
I wonder if it's better to have:
x1 = x0.copy
andx2 = x0.copy
before the loop.- at the end of the loop have
x1[i] = x2[i] = x0[i]
to reset the value.
For large values of h.size
this will prevent h.size - 2
copies.
then do h_vecs *= 0
I wrote two versions of this idea. The first version avoids copying the vectors in the loop by setting up three vectors (x1, x2, and a complex version of x1) at the beginning, and repeatedly re-using them in each loop. This is branch njo-numdiff-sep-vars. I also tried a version which moves the I benchmarked the original approach and both changes using the function
for varying branches, methods and number of dimensions. I ran this between 5 and 50000 times, depending on the dimensions, and measured the average times. I re-ran this benchmark 20 times, to capture the variation between runs. PlotsConclusionBased on these plots, I find the following:
So, there's a tradeoff here between algorithm complexity and speed. Which approach would you prefer?
Raw resultsThe benchmark scripts and CSV of raw timings are available here, if you want to do your own analysis.
I don't understand this part of the feedback. I removed this variable. Can you explain? |
From those graphs it looks like either of the three options are performant enough. This means the best solution is the one that is the easiest to read and maintain. For me that's njo-numdiff-sep-vars. Sorry about the h_vecs comment, that's a bit of cruft from a draft that I didn't delete. So just ignore that. |
This looks good to me. Thank you very much for the PR, is it your first? Congratulations. I lurk on stackoverflow and notice that you answer a lot of questions. Thank you for doing that, it's helpful for the user community, and helpful for the project. |
It's my first in SciPy, but I've made PRs for other open source projects before.
Glad to help! :) |
Currently, the code generates all possible vectors to change x0 by at once. When optimizing a function in many dimensions, this uses a quadratic amount of memory, which is wasteful. Reduce this by computing those vectors one at a time.
This is my first change to SciPy; I appreciate any feedback.
Reference issue
There is no previously filed issue for this problem.
What does this implement/fix?
Here are two examples that I've seen on StackOverflow of users running into this problem:
https://stackoverflow.com/questions/77867437/scipy-minimize-returns-an-array-too-big
https://stackoverflow.com/questions/78399838/when-using-scipy-minimize-a-large-array-appears-which-was-not-originally-there
To summarize, both of these people are trying to minimize a function of ~90000 variables, and SciPy then asks to allocate an array which is N ** 2 * 8 bytes in size to compute the derivative. This seems strange, since this derivative can be computed in linear amounts of memory. There is also a solver, L-BFGS-B, which is capable of minimizing a function this large, so doing this in linear amounts of memory is not just of academic interest.
Memory usage benchmarks
I have two goals for this change:
Here is a benchmark showing the difference in memory usage:
(Warning: memory usage will be wrong on OS X.)
Result, running on this branch
Result, running on current main (4a537b7)
(Interestingly, this is less memory than predicted by the formula N ** 2 * 8 bytes. I believe this is because Linux lazily allocates memory. Each row in the matrix is roughly twenty 4K pages long, but only one of those pages must be allocated, since there is only one non-zero value per row.)
CPU Benchmarks
I also attempted to benchmark this with the built-in benchmarks, but could not figure out how to use
--compare
. (See airspeed-velocity/asv#1401) Benchmarking without--compare
, it seems on benchmarks where no derivative is supplied, this method is either faster, or only slower by ~1%.Optimize benchmarks, my branch:
Main branch