Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling issues in l-bfgs for LogisticRegression #15556

Closed
amueller opened this issue Nov 6, 2019 · 14 comments
Closed

Scaling issues in l-bfgs for LogisticRegression #15556

amueller opened this issue Nov 6, 2019 · 14 comments

Comments

@amueller
Copy link
Member

amueller commented Nov 6, 2019

So it looks like l-bfgs is very sensitive to scaling of the data, which can lead to convergence issues.
I feel like we might be able to fix this by changing the framing of the optimization?

example:

import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale

data = fetch_openml(data_id=1590, as_frame=True)
cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)

this gives convergence warnings, after scaling it doesn't. I have seen this in many places. While people should scale I think warning about number of iterations is not a good thing to show to the user. If we can fix this, I think we should.

Using the bank campaign data I got coefficients that were quite different if I increased the number of iterations (I got convergence warnings with the default of 100). If I scaled the data, that issue went away.

@glemaitre
Copy link
Member

+1 we saw that when preparing the tutorial at euroscipy and make an example to highlight the importance of scaling there. We wanted to make a PR adding the issue of scaling in the warning.

@glemaitre
Copy link
Member

@amueller do you think that changing the warning is sufficient?

@amueller
Copy link
Member Author

amueller commented Nov 7, 2019

If we can make the optimization more numerically stable I would prefer that. We could add to the warning for this release maybe?

@glemaitre
Copy link
Member

We could add to the warning for this release maybe?

Yep this seems good.

@amueller
Copy link
Member Author

amueller commented Nov 8, 2019

hm maybe we can still do it for the final release? do you wanna do it? I'm at a workshop...

@amueller
Copy link
Member Author

amueller commented Nov 9, 2019

I'm actually not entirely sure how to best solve this. This is and issue for nearly all of the solvers, I think.
There's basically two obvious solutions: be magic and scale the data, or do what the user asked and use a diagonal preconditioner (which is I think what liblinear does). The preconditioner is basically free and likely to solve the issue for lbfgs, Newton, sag and saga.
The main difference between the two approaches is in the relative penalization of different coefficients.
I guess the standard line of sklearn would be to do exactly what the user told us to, so we should use a diagonal preconditioner.
Thoughts by @GaelVaroquaux @agramfort @rth @ogrisel who thought a lot about linear models?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 9, 2019 via email

@amueller
Copy link
Member Author

Labeling this as a regression as we used liblinear in 0.21, which doesn't have this issue (because it uses a preconditioner).

@amueller
Copy link
Member Author

amueller commented Dec 27, 2019

The fun never ends. Here's a toy example from my book where liblinear is worse and gives qualitatively different results?!

import numpy as np
from sklearn.datasets import make_blobs

def make_forge():
    # a carefully hand-designed dataset lol
    X, y = make_blobs(centers=2, random_state=4, n_samples=30)
    y[np.array([7, 27])] = 0
    mask = np.ones(len(X), dtype=np.bool)
    mask[np.array([0, 1, 5, 26])] = 0
    X, y = X[mask], y[mask]
    return X, y

from sklearn.linear_model import LogisticRegression
X, y = make_forge()
lr1 = LogisticRegression(solver='liblinear').fit(X, y)
lr2 = LogisticRegression(solver='lbfgs').fit(X, y)

from sklearn.linear_model._logistic import _logistic_loss

print(_logistic_loss(np.hstack([lr1.coef_.ravel(), lr1.intercept_]), X, 2 * y - 1, 1))
print(_logistic_loss(np.hstack([lr2.coef_.ravel(), lr2.intercept_]), X, 2 * y - 1, 1))
7.283720321745406
5.501204675670945
print(lr1.coef_)
print(lr2.coef_)
[[-0.36631767  1.25909579]]
[[0.67289534 1.53136443]]

See
amueller/introduction_to_ml_with_python#124

@agramfort
Copy link
Member

agramfort commented Dec 28, 2019 via email

@amueller
Copy link
Member Author

amueller commented Jan 3, 2020

@agramfort I'm not sure what you mean by "just do" but if you think that the difference is because of the intercept scaling that's good to know!

@agramfort
Copy link
Member

agramfort commented Jan 4, 2020 via email

@lorentzenchr
Copy link
Member

lorentzenchr commented Oct 26, 2024

The problem is fixed with the new "newton-cholesky" solver #24767 (edit: example is binary classification, so the corresponding multiclass solver #28840 is not needed). Without issuing any warning:

import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale

data = fetch_openml(data_id=1590, as_frame=True)  # adults dataset
cross_val_score(LogisticRegression(solver="newton-cholesky"), pd.get_dummies(data.data), data.target)
array([0.85218548, 0.85054765, 0.84889435, 0.85288698, 0.85626536])

The default solver lbfgs gives quite different solutions

cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
...
array([0.7945542 , 0.79752278, 0.78634316, 0.78746929, 0.80149468])

I commented on preconditioning here.

Shall we close?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Oct 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants