-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling issues in l-bfgs for LogisticRegression #15556
Comments
+1 we saw that when preparing the tutorial at euroscipy and make an example to highlight the importance of scaling there. We wanted to make a PR adding the issue of scaling in the warning. |
@amueller do you think that changing the warning is sufficient? |
If we can make the optimization more numerically stable I would prefer that. We could add to the warning for this release maybe? |
Yep this seems good. |
hm maybe we can still do it for the final release? do you wanna do it? I'm at a workshop... |
I'm actually not entirely sure how to best solve this. This is and issue for nearly all of the solvers, I think. |
I guess the standard line of sklearn would be to do exactly what the user told us to, so we should use a diagonal preconditioner.
+1: diagonal preconditioner, as we try to solve the canonical problem.
Good thinking @amueller, this will be useful!
|
Labeling this as a regression as we used liblinear in 0.21, which doesn't have this issue (because it uses a preconditioner). |
The fun never ends. Here's a toy example from my book where liblinear is worse and gives qualitatively different results?! import numpy as np
from sklearn.datasets import make_blobs
def make_forge():
# a carefully hand-designed dataset lol
X, y = make_blobs(centers=2, random_state=4, n_samples=30)
y[np.array([7, 27])] = 0
mask = np.ones(len(X), dtype=np.bool)
mask[np.array([0, 1, 5, 26])] = 0
X, y = X[mask], y[mask]
return X, y
from sklearn.linear_model import LogisticRegression
X, y = make_forge()
lr1 = LogisticRegression(solver='liblinear').fit(X, y)
lr2 = LogisticRegression(solver='lbfgs').fit(X, y)
from sklearn.linear_model._logistic import _logistic_loss
print(_logistic_loss(np.hstack([lr1.coef_.ravel(), lr1.intercept_]), X, 2 * y - 1, 1))
print(_logistic_loss(np.hstack([lr2.coef_.ravel(), lr2.intercept_]), X, 2 * y - 1, 1))
print(lr1.coef_)
print(lr2.coef_)
|
just do:
lr1 = LogisticRegression(solver='liblinear',
intercept_scaling=10000.).fit(X, y)
the regularization of the intercept in liblinear is really annoying....
… |
@agramfort I'm not sure what you mean by "just do" but if you think that the difference is because of the intercept scaling that's good to know! |
yes it is because of the intercept regularization of liblinear. Using a
high intercept_scaling mitigates the issue and fixes the problem here.
… |
The problem is fixed with the new "newton-cholesky" solver #24767 (edit: example is binary classification, so the corresponding multiclass solver #28840 is not needed). Without issuing any warning: import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale
data = fetch_openml(data_id=1590, as_frame=True) # adults dataset
cross_val_score(LogisticRegression(solver="newton-cholesky"), pd.get_dummies(data.data), data.target)
The default solver lbfgs gives quite different solutions cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)
I commented on preconditioning here. Shall we close? |
+1 for closing. And congratulations!
…On Oct 26, 2024, 21:04, at 21:04, Christian Lorentzen ***@***.***> wrote:
The problem is fixed with the new "newton-cholesky" solver for
multiclass #28840. Without issuing any warning:
```python
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale
data = fetch_openml(data_id=1590, as_frame=True)
cross_val_score(LogisticRegression(solver="newton-cholesky"),
pd.get_dummies(data.data), data.target)
```
```
array([0.85218548, 0.85054765, 0.84889435, 0.85288698, 0.85626536])
```
The default solver lbfgs gives quite different solutions
```python
cross_val_score(LogisticRegression(), pd.get_dummies(data.data),
data.target)
```
```
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown
in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
...
array([0.7945542 , 0.79752278, 0.78634316, 0.78746929, 0.80149468])
```
I commented on preconditioning
[here](#15583 (comment)).
Shall we close?
--
Reply to this email directly or view it on GitHub:
#15556 (comment)
You are receiving this because you were mentioned.
Message ID:
***@***.***>
|
So it looks like l-bfgs is very sensitive to scaling of the data, which can lead to convergence issues.
I feel like we might be able to fix this by changing the framing of the optimization?
example:
this gives convergence warnings, after scaling it doesn't. I have seen this in many places. While people should scale I think warning about number of iterations is not a good thing to show to the user. If we can fix this, I think we should.
Using the bank campaign data I got coefficients that were quite different if I increased the number of iterations (I got convergence warnings with the default of 100). If I scaled the data, that issue went away.
The text was updated successfully, but these errors were encountered: