Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/two way scaling #104

Open
wants to merge 34 commits into
base: develop
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9e50115
Added basic functions to test two-way center and scaling
mnarayan Oct 9, 2017
2f74e1b
Added basic twoway standardization algorithm. Relevant to issue #93
mnarayan Oct 9, 2017
f2a1b20
Cleaned up TwoWayStandardScaler API. partial_fit not supported
mnarayan Oct 9, 2017
a30bc8e
Reset internal row,col attributes
mnarayan Oct 9, 2017
ed6fd34
Added basic structure for partial_fit
mnarayan Oct 9, 2017
47c87cb
partial_fit now calculates row, col statistics
mnarayan Oct 11, 2017
2dcde0a
Added convergence checks. Algorithm completed
mnarayan Oct 11, 2017
dd92d80
Transform now calls twoway_standardize
mnarayan Oct 11, 2017
a754f44
Updated algorithm. Test passes
mnarayan Oct 11, 2017
4a3c038
Fixed bug in transform()
mnarayan Oct 11, 2017
cc1d8d3
Return original dimensions
mnarayan Oct 11, 2017
16b1116
inverse_transform completed, raises not implemented error
mnarayan Oct 11, 2017
a58383a
Delinting
mnarayan Oct 11, 2017
5904f60
More delinting
mnarayan Oct 11, 2017
a17a530
Fixed import error
mnarayan Oct 12, 2017
e5395bd
Added clean.py
mnarayan Oct 12, 2017
87ee167
Fix merge conflicts
jasonlaska Sep 9, 2018
a2940df
Rename files from `clean` to `two_way_standard_scaler`
jasonlaska Sep 9, 2018
238f393
Add estimator check
jasonlaska Sep 9, 2018
34dc936
Rename commont_test to sklearn_test as is more descriptive of this test.
jasonlaska Sep 9, 2018
7869659
Address initial comments and some cleanup.
jasonlaska Sep 9, 2018
9cbb212
Black formatting and more simplification and cleanup.
jasonlaska Sep 9, 2018
a8e980f
Black formatting and more simplification and cleanup.
jasonlaska Sep 9, 2018
e864e72
Ensure interface can be validated.
jasonlaska Sep 9, 2018
7f86bb3
More simplification.
jasonlaska Sep 9, 2018
748fe33
Autoformat.
jasonlaska Sep 9, 2018
d2800fc
Bring back partial_fit capability, add tests, ask questions.
jasonlaska Sep 9, 2018
7c37030
Minor cleanup.
jasonlaska Sep 9, 2018
1757216
Raise on inverse transform, remove code.
jasonlaska Sep 10, 2018
17806e8
Remove unneeded check.
jasonlaska Sep 10, 2018
f1f682e
Remove redundant raise.
jasonlaska Sep 10, 2018
eb8c54b
Remove unneeded comments.
jasonlaska Sep 10, 2018
070c017
Merge branch 'develop' of github.com:skggm/skggm into feature/two-way…
jasonlaska Sep 10, 2018
4f8267e
Merge branch 'develop' of github.com:skggm/skggm into feature/two-way…
jasonlaska Sep 12, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Cleaned up TwoWayStandardScaler API. partial_fit not supported
  • Loading branch information
mnarayan committed Oct 9, 2017
commit f2a1b2059d4cb4dd2069d7a754dbf4be785515fe
45 changes: 16 additions & 29 deletions inverse_covariance/clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ def twoway_standardize(X, axis=0, with_mean=True, with_std=True, copy=True, max_

class TwoWayStandardScaler(BaseEstimator, TransformerMixin):
"""Standardize features by removing the mean and scaling to unit variance
in both row and column dimensions.
This is modeled after StandardScaler in scikit-learn.
in both row and column dimensions.
This class is modeled after StandardScaler in scikit-learn.
Read more in the :ref:`User Guide <preprocessing_scaler>`.
Parameters
----------
Expand Down Expand Up @@ -123,24 +123,22 @@ class TwoWayStandardScaler(BaseEstimator, TransformerMixin):
new calls to fit, but increments across ``partial_fit`` calls.
Examples
--------
>>> from sklearn.preprocessing import StandardScaler
>>> from inverse_covariance.clean import TwoWayStandardScaler
>>>
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> data = [[1, 0], [1, 0], [2, 1], [2, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler(copy=True, with_mean=True, with_std=True)
>>> print(scaler.mean_)
[ 0.5 0.5]
[ 3.0 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
[-1. -1.]
[ 1. 1.]
[ 1. 1.]]
>>> print(scaler.transform([[2, 2]]))
[[ 3. 3.]]
See also
--------
scale: Equivalent function without the estimator API.
twoway_standardize: Equivalent function without the estimator API.
:class:`sklearn.preprocessing.StandardScaler`
:class:`sklearn.decomposition.PCA`
Further removes the linear correlation across features with 'whiten=True'.
Expand All @@ -151,42 +149,31 @@ class TwoWayStandardScaler(BaseEstimator, TransformerMixin):
""" # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noqa shouldnt be needed for the docstring, please format to 80 chars if possible


def __init__(self, copy=True, with_mean=True, with_std=True):
self.with_mean = with_mean
"""Unlike StandardScaler, with_mean is always set to True, to ensure
that two-way standardization is always performed with centering. The
argument `with_mean` is retained for the sake of model API compatibility.
"""
self.with_mean = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.with_mean = with_mean

self.with_std = with_std
self.copy = copy

def _reset(self):
"""Reset internal data-dependent state of the scaler, if necessary.
__init__ parameters are not touched.
"""

# Checking one attribute is enough, becase they are all set together
# in partial_fit
if hasattr(self, 'scale_'):
del self.scale_
del self.n_samples_seen_
del self.mean_
del self.var_

def fit(self, X, y=None):
"""Compute the mean and std to be used for later scaling.
"""Compute the mean and std for both row and column dimensions.
Parameters
----------
X : {array-like, sparse matrix}, shape [n_samples, n_features]
X : {array-like}, shape [n_rows, n_cols]
The data used to compute the mean and standard deviation
used for later scaling along the features axis.
y : Passthrough for ``Pipeline`` compatibility.
along both row and column axes
y : Passthrough for ``Pipeline`` compatibility. Input is ignored.
"""

# Reset internal state before fitting
self._reset()
return self.partial_fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason that we cant just take the guts of partial fit and put it in fit? Since this function is just an interface wrapper over the other w no changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressing this by removing partial_fit


def transform(self, X, y='deprecated', copy=None):
"""Perform standardization by centering and scaling
Parameters
----------
X : array-like, shape [n_samples, n_features]
X : array-like, shape [n_rows, n_cols]
The data used to scale along the features axis.
y : (ignored)
.. deprecated:: 0.19
Expand Down