Scaling issues in l-bfgs for LogisticRegression #15556

amueller · 2019-11-06T23:51:39Z

So it looks like l-bfgs is very sensitive to scaling of the data, which can lead to convergence issues.
I feel like we might be able to fix this by changing the framing of the optimization?

example:

import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale

data = fetch_openml(data_id=1590, as_frame=True)
cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)

this gives convergence warnings, after scaling it doesn't. I have seen this in many places. While people should scale I think warning about number of iterations is not a good thing to show to the user. If we can fix this, I think we should.

Using the bank campaign data I got coefficients that were quite different if I increased the number of iterations (I got convergence warnings with the default of 100). If I scaled the data, that issue went away.

glemaitre · 2019-11-07T09:54:54Z

+1 we saw that when preparing the tutorial at euroscipy and make an example to highlight the importance of scaling there. We wanted to make a PR adding the issue of scaling in the warning.

glemaitre · 2019-11-07T09:56:31Z

@amueller do you think that changing the warning is sufficient?

amueller · 2019-11-07T16:14:52Z

If we can make the optimization more numerically stable I would prefer that. We could add to the warning for this release maybe?

glemaitre · 2019-11-07T16:42:07Z

We could add to the warning for this release maybe?

Yep this seems good.

amueller · 2019-11-08T17:14:24Z

hm maybe we can still do it for the final release? do you wanna do it? I'm at a workshop...

amueller · 2019-11-09T22:56:27Z

I'm actually not entirely sure how to best solve this. This is and issue for nearly all of the solvers, I think.
There's basically two obvious solutions: be magic and scale the data, or do what the user asked and use a diagonal preconditioner (which is I think what liblinear does). The preconditioner is basically free and likely to solve the issue for lbfgs, Newton, sag and saga.
The main difference between the two approaches is in the relative penalization of different coefficients.
I guess the standard line of sklearn would be to do exactly what the user told us to, so we should use a diagonal preconditioner.
Thoughts by @GaelVaroquaux @agramfort @rth @ogrisel who thought a lot about linear models?

GaelVaroquaux · 2019-11-09T23:01:34Z

I guess the standard line of sklearn would be to do exactly what the user told us to, so we should use a diagonal preconditioner.

+1: diagonal preconditioner, as we try to solve the canonical problem. Good thinking @amueller, this will be useful!

amueller · 2019-11-10T08:00:25Z

Labeling this as a regression as we used liblinear in 0.21, which doesn't have this issue (because it uses a preconditioner).

amueller · 2019-12-27T17:55:25Z

The fun never ends. Here's a toy example from my book where liblinear is worse and gives qualitatively different results?!

import numpy as np
from sklearn.datasets import make_blobs

def make_forge():
    # a carefully hand-designed dataset lol
    X, y = make_blobs(centers=2, random_state=4, n_samples=30)
    y[np.array([7, 27])] = 0
    mask = np.ones(len(X), dtype=np.bool)
    mask[np.array([0, 1, 5, 26])] = 0
    X, y = X[mask], y[mask]
    return X, y

from sklearn.linear_model import LogisticRegression
X, y = make_forge()
lr1 = LogisticRegression(solver='liblinear').fit(X, y)
lr2 = LogisticRegression(solver='lbfgs').fit(X, y)

from sklearn.linear_model._logistic import _logistic_loss

print(_logistic_loss(np.hstack([lr1.coef_.ravel(), lr1.intercept_]), X, 2 * y - 1, 1))
print(_logistic_loss(np.hstack([lr2.coef_.ravel(), lr2.intercept_]), X, 2 * y - 1, 1))

7.283720321745406
5.501204675670945

print(lr1.coef_)
print(lr2.coef_)

[[-0.36631767  1.25909579]]
[[0.67289534 1.53136443]]

See
amueller/introduction_to_ml_with_python#124

agramfort · 2019-12-28T11:49:20Z

just do: lr1 = LogisticRegression(solver='liblinear', intercept_scaling=10000.).fit(X, y) the regularization of the intercept in liblinear is really annoying....

…

amueller · 2020-01-03T20:17:26Z

@agramfort I'm not sure what you mean by "just do" but if you think that the difference is because of the intercept scaling that's good to know!

agramfort · 2020-01-04T14:27:08Z

yes it is because of the intercept regularization of liblinear. Using a high intercept_scaling mitigates the issue and fixes the problem here.

…

lorentzenchr · 2024-10-26T19:03:40Z

The problem is fixed with the new "newton-cholesky" solver #24767 (edit: example is binary classification, so the corresponding multiclass solver #28840 is not needed). Without issuing any warning:

import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale

data = fetch_openml(data_id=1590, as_frame=True)  # adults dataset
cross_val_score(LogisticRegression(solver="newton-cholesky"), pd.get_dummies(data.data), data.target)

array([0.85218548, 0.85054765, 0.84889435, 0.85288698, 0.85626536])

The default solver lbfgs gives quite different solutions

cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
...
array([0.7945542 , 0.79752278, 0.78634316, 0.78746929, 0.80149468])

I commented on preconditioning here.

Shall we close?

GaelVaroquaux · 2024-10-26T20:02:34Z

+1 for closing. And congratulations!

…

On Oct 26, 2024, 21:04, at 21:04, Christian Lorentzen ***@***.***> wrote: The problem is fixed with the new "newton-cholesky" solver for multiclass #28840. Without issuing any warning: ```python import pandas as pd from sklearn.datasets import fetch_openml from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import scale data = fetch_openml(data_id=1590, as_frame=True) cross_val_score(LogisticRegression(solver="newton-cholesky"), pd.get_dummies(data.data), data.target) ``` ``` array([0.85218548, 0.85054765, 0.84889435, 0.85288698, 0.85626536]) ``` The default solver lbfgs gives quite different solutions ```python cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target) ``` ``` ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression ... array([0.7945542 , 0.79752278, 0.78634316, 0.78746929, 0.80149468]) ``` I commented on preconditioning [here](#15583 (comment)). Shall we close? -- Reply to this email directly or view it on GitHub: #15556 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

thomasjpfan mentioned this issue Nov 8, 2019

[MRG] ENH: Add scaling to convergence warning for LBFGS #15571

Merged

amueller mentioned this issue Nov 10, 2019

MRG Logistic regression preconditioning #15583

Closed

amueller added the Regression label Nov 10, 2019

glemaitre mentioned this issue Nov 10, 2019

Minimal Generalized linear models implementation (L2 + lbfgs) #14300

Merged

7 tasks

cmarmo added the module:linear_model label Mar 28, 2022

lorentzenchr closed this as completed Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling issues in l-bfgs for LogisticRegression #15556

Scaling issues in l-bfgs for LogisticRegression #15556

amueller commented Nov 6, 2019 •

edited by lorentzenchr

Loading

glemaitre commented Nov 7, 2019

glemaitre commented Nov 7, 2019

amueller commented Nov 7, 2019

glemaitre commented Nov 7, 2019

amueller commented Nov 8, 2019

amueller commented Nov 9, 2019

GaelVaroquaux commented Nov 9, 2019 via email

amueller commented Nov 10, 2019

amueller commented Dec 27, 2019 •

edited

Loading

agramfort commented Dec 28, 2019 via email

amueller commented Jan 3, 2020

agramfort commented Jan 4, 2020 via email

lorentzenchr commented Oct 26, 2024 •

edited

Loading

GaelVaroquaux commented Oct 26, 2024 via email

Scaling issues in l-bfgs for LogisticRegression #15556

Scaling issues in l-bfgs for LogisticRegression #15556

Comments

amueller commented Nov 6, 2019 • edited by lorentzenchr Loading

glemaitre commented Nov 7, 2019

glemaitre commented Nov 7, 2019

amueller commented Nov 7, 2019

glemaitre commented Nov 7, 2019

amueller commented Nov 8, 2019

amueller commented Nov 9, 2019

GaelVaroquaux commented Nov 9, 2019 via email

amueller commented Nov 10, 2019

amueller commented Dec 27, 2019 • edited Loading

agramfort commented Dec 28, 2019 via email

amueller commented Jan 3, 2020

agramfort commented Jan 4, 2020 via email

lorentzenchr commented Oct 26, 2024 • edited Loading

GaelVaroquaux commented Oct 26, 2024 via email

amueller commented Nov 6, 2019 •

edited by lorentzenchr

Loading

amueller commented Dec 27, 2019 •

edited

Loading

lorentzenchr commented Oct 26, 2024 •

edited

Loading