-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Deprecate fetch_mldata #11466
Conversation
I'm fine with the deprecation, but do we have ways to make the examples more friendly? (e.g., make use of scikit-learn/examples-data) I'm a bit uncomfortable about things like |
all the fetchers can be removed when we add the openml loader, right? If not, we should upload those datasets to openml. |
@jnothman Hi, do you need help on this right now ? |
No, thanks @FollowKenny. The plan is to get |
This isn't perfect, but it's mergeable after #11419 |
We should also look into uploading Mauna Loa to openml |
I think if this goes green, I'll be bold and merge #11419 and create a long series of follow-up issues. |
This is now ready for review. |
|
||
Downloading datasets from the mldata.org repository | ||
--------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's better to remove this section altogether rather than say put a deprecation note at the begining?
@@ -257,7 +257,6 @@ Loaders | |||
datasets.fetch_kddcup99 | |||
datasets.fetch_lfw_pairs | |||
datasets.fetch_lfw_people | |||
datasets.fetch_mldata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, maybe still keep the doc even if it's deprecated? Who knows maybe the website will work again in september, and there is still code using this. Those users will no longer be able to find any documentation about this function in the stable docs... A doc with a large deprecation warning would avoid us some opened issues from confused users.
data = fetch_mldata('mauna-loa-atmospheric-co2').data | ||
X = data[:, [1]] | ||
y = data[:, 0] | ||
def load_mauna_loa_atmospheric_c02(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should open an issue that this should be uploaded to openml, as you suggested?
fetch_mldata, 'not_existing_name') | ||
finally: | ||
datasets.mldata.urlopen = _urlopen_ref | ||
|
||
|
||
def test_fetch_one_column(tmpdata): | ||
def test_fetch_one_column(tmpdata, recwarn): | ||
warnings.simplefilter('ignore', DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather use,
@pytest.mark.filterwarnings('ignore::DeprecationWarning'):
def test_fetch_one_column(tmpdata):
...
if I got the syntax right. I'm never sure about the side effects explicitly setting warning filters could have in a pytest session..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought recwarn grabbed a filter context... But you're right that syntax is a little better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, whether this isolates the filter context (and will continue doing so in the future) is another story. That's why using standard pytest mechanisms is probably safer (besides syntax considerations).
@@ -82,7 +87,8 @@ def test_fetch_one_column(tmpdata): | |||
datasets.mldata.urlopen = _urlopen_ref | |||
|
|||
|
|||
def test_fetch_multiple_column(tmpdata): | |||
def test_fetch_multiple_column(tmpdata, recwarn): | |||
warnings.simplefilter('ignore', DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
@@ -1513,6 +1512,7 @@ To be removed in 0.22 | |||
:template: deprecated_function.rst | |||
|
|||
covariance.graph_lasso | |||
datasets.fetch_mldata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will generate the docs with the large deprecation warning.
fetch_mldata, 'not_existing_name') | ||
finally: | ||
datasets.mldata.urlopen = _urlopen_ref | ||
|
||
|
||
def test_fetch_one_column(tmpdata): | ||
def test_fetch_one_column(tmpdata, recwarn): | ||
warnings.simplefilter('ignore', DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought recwarn grabbed a filter context... But you're right that syntax is a little better.
are these test failures from master? |
@@ -3,6 +3,7 @@ | |||
import os | |||
import scipy as sp | |||
import shutil | |||
import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used (that's the flake8 error)
is that the failure that's fixed in #11831? |
mldata doctest is failing as far as I can see. We might want to skip it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, +1 to skip the test (maybe # doctest: +SKIP
?)
Maybe we should include it in the rc
doc/whats_new/v0.20.rst
Outdated
@@ -209,6 +209,10 @@ Support for Python 3.3 has been officially dropped. | |||
data points could be generated. :issue:`10045` by :user:`Christian Braune | |||
<christianbraune79>`. | |||
|
|||
- |API| Deprecated :func:`sklearn.datasets.fetch_mldata` to be removed in | |||
version 0.22. MLData.org is no longer operational. Until removal it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mldata.org?
"""Load MNIST, select two classes, shuffle and return only n_samples.""" | ||
mnist = fetch_mldata('MNIST original') | ||
mnist = fetch_openml('mnist_784', version=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(alternative) Maybe we can add a link to openml in the examples (e.g., https://www.openml.org/d/554) so that users can get more information about the dataset.
Definitely, I'm spending too much time writing pytest fixtures .. |
I just deleted the example. If someone wants to remind me the right way to disable it... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'm fine with removing the doctests. We should encourage users to turn to openml.
Appveyor failure irrelevant.
Merging, thanks @jnothman ! |
Fixes #11317
I have changed the use of
fetch_mldata
in examples, but not yet in benchmarks under the assumption that those will be replaced withfetch_openml
soon enough (even if after 0.20 release).Waiting on the OpenML fetcher (#11419) to be completed.