Skip to content

Commit

Permalink
[MRG+1] Ensures that partial_fit for sklearn.decomposition.Incrementa…
Browse files Browse the repository at this point in the history
…lPCA uses float division (scikit-learn#9492)

* Ensures that partial_fit uses float division

* Switches to using future division for float division

* Adds non-regression test for issue scikit-learn#9489

* Updates test to remove dependence on a "known answer"

* Updates doc/whats_new.rst with entry for this PR

* Specifies bug fix is for Python 2 versions in doc/whats_new.rst
  • Loading branch information
jrbourbeau authored and NelleV committed Aug 14, 2017
1 parent 27ae048 commit 86d8f18
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 1 deletion.
24 changes: 23 additions & 1 deletion doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,18 @@ Version 0.20 (under development)
Changed models
--------------

The following estimators and functions, when fit with the same data and
parameters, may produce different models from the previous version. This often
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
random sampling procedures.

- :class:`decomposition.IncrementalPCA` in Python 2 (bug fix)

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we
cannot assure that this list is complete.)

Changelog
---------

Expand All @@ -24,6 +36,16 @@ Classifiers and regressors
via ``n_iter_no_change``, ``validation_fraction`` and ``tol``. :issue:`7071`
by `Raghav RV`_

Bug fixes
.........

Decomposition, manifold learning and clustering

- Fixed a bug where the ``partial_fit`` method of
:class:`decomposition.IncrementalPCA` used integer division instead of float
division on Python 2 versions. :issue:`9492` by
:user:`James Bourbeau <jrbourbeau>`.


Version 0.19
============
Expand Down Expand Up @@ -160,7 +182,7 @@ Model selection and evaluation
:issue:`8120` by `Neeraj Gangwar`_.

- Added a scorer based on :class:`metrics.explained_variance_score`.
:issue:`9259` by `Hanmin Qin <https://github.com/qinhanmin2014>`_.
:issue:`9259` by `Hanmin Qin <https://github.com/qinhanmin2014>`_.

Miscellaneous

Expand Down
1 change: 1 addition & 0 deletions sklearn/decomposition/incremental_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# Giorgio Patrini
# License: BSD 3 clause

from __future__ import division
import numpy as np
from scipy import linalg

Expand Down
24 changes: 24 additions & 0 deletions sklearn/decomposition/tests/test_incremental_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,3 +273,27 @@ def test_whitening():
assert_almost_equal(X, Xinv_ipca, decimal=prec)
assert_almost_equal(X, Xinv_pca, decimal=prec)
assert_almost_equal(Xinv_pca, Xinv_ipca, decimal=prec)


def test_incremental_pca_partial_fit_float_division():
# Test to ensure float division is used in all versions of Python
# (non-regression test for issue #9489)

rng = np.random.RandomState(0)
A = rng.randn(5, 3) + 2
B = rng.randn(7, 3) + 5

pca = IncrementalPCA(n_components=2)
pca.partial_fit(A)
# Set n_samples_seen_ to be a floating point number instead of an int
pca.n_samples_seen_ = float(pca.n_samples_seen_)
pca.partial_fit(B)
singular_vals_float_samples_seen = pca.singular_values_

pca2 = IncrementalPCA(n_components=2)
pca2.partial_fit(A)
pca2.partial_fit(B)
singular_vals_int_samples_seen = pca2.singular_values_

np.testing.assert_allclose(singular_vals_float_samples_seen,
singular_vals_int_samples_seen)

0 comments on commit 86d8f18

Please sign in to comment.