-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.optim.lbfgs - added box constraint and line search methods(back… #938
Conversation
…tracking, goldstein, weak_wolfe) torch.tensor.prod - resolved the problem with zero element
return grad_output.new(self.input_size).fill_(0) | ||
else: | ||
grad_input = grad_output.new(self.input_size).fill_(0) | ||
indexing_tuple = tuple(zero_loc[0].numpy()) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
I took your prod fixes and refactored them slightly so that they reuse some intermediate results. Can you please remove this change from this PR? I'll try to get someone who knows line search methods to review your LBFGS changes soon. Thanks! |
@apaszke Has anyone looked at the line search part of this MR yet? |
@pytorchbot retest this please |
Is there anything one can do to push this along? |
@hassec we are blocked on finding competent reviewers for this stuff, who know the relevant literature and can verify the code implemented |
I'll review this in the coming days. |
What is the state of this PR ? |
Wondering what the status of this pull request is. Would be great to have constraint options. |
@matthieuheitz @ianwilliamson would either of you be willing to review the PR for correctness against the papers? that's what it's blocked on at the moment. |
Is anyone looking at this? |
No not at the moment. We need someone to review the PR for correctness (math), and it also needs some rebasing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
assert offset == self._numel() | ||
return deriv | ||
|
||
def _max_alpha(self, d): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference for bounded-constrained optimization: equation 16.71 in Numerical Optimization 2nd Edition by Nocedal and Wright.
original_param_data_list = self._copy_param() | ||
phi_0 = closure().data[0] | ||
phi_0_prime = self._directional_derivative(d) | ||
alpha_k = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the _goldstein
and _weak_wolfe
, _backtracking
is not invoking _max_alpha
here.
if u_bnd is not None: | ||
from_u_bnd = ((u_bnd-p.data)/p_grad)[p_grad>0] | ||
min_u_bnd = torch.min(from_u_bnd) if from_u_bnd.numel() > 0 else max_alpha | ||
max_alpha = min(max_alpha, min_l_bnd, min_u_bnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line takes the minimal alpha across all the directions. This implies that, if the point is near the boundary, the next step is constrained to remain very close and the descent could stall. I expect instead to do different step size in each directions, see 16.71 mentioned in previous comment. Is there another reference for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant part of scipy is the fortran lbfgsb library's cauchy function. The gradient is scaled differently for each component, as done in Nocedal equation 16.71.
I will close this PR since
|
* pipe through index mode * replace codegen srings * cache index mode * use std limit * move definitions * rename INDEX_TYPE
torch.optim.lbfgs
torch.autograd._function.reduce.Prod class
inf
, because the original implementation is done by[product of all elements]/[an element]
. All these cases are taken care of separately. Still,tensor.prod()
returnsinf
if there is only one element in a tensor.This should be fixed.