Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Boston Housing Data #163

Open
sdempwolf opened this issue Sep 30, 2022 · 3 comments
Open

Problem with Boston Housing Data #163

sdempwolf opened this issue Sep 30, 2022 · 3 comments

Comments

@sdempwolf
Copy link

hello,
On Sep 28 2022 I was working with the Boston Housing data and the exercises in module 02 supervised-learning. We received a message that there was an ethical problem with the Boston Housing data and that scikit-learn was recommending a switch to the California Housing data, for which they provided links.
I ended up modifying the mglearn/datasets.py file, adding the import line and a function load_extended_california(). This allows the rest of the code in the notebook to function as written with the California housing data.

from sklearn.datasets import fetch_california_housing

def load_extended_california():
housing = fetch_california_housing()
X = housing.data

X = MinMaxScaler().fit_transform(housing.data)
X = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)
return X, housing.target
@amueller
Copy link
Owner

Hi!
Yes, I was part of the discussion of making that change in sklearn. Since the book is using this dataset, the repo will continue to use that dataset. If I end up revising the book (somewhat unlikely at this point), I will replace the dataset.

@rsrenner
Copy link

Hi! Yes, I was part of the discussion of making that change in sklearn. Since the book is using this dataset, the repo will continue to use that dataset. If I end up revising the book (somewhat unlikely at this point), I will replace the dataset.

Hi Andreas,
I love using your book & notebooks in my classes. However, I don't want to have to revert to sklearn <1.2. I tried just replacing the references to Boston housing dataset with California housing data, but unsuccessful. Can you please point me to the files where this change needs to occur, as I must be missing one somehow. Or, will this approach just not work?

@amueller
Copy link
Owner

amueller commented Jun 1, 2023

Please update the mglearn library, that should solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants