Skip to content

Commit

Permalink
correct 'MDEV' to 'MEDV'
Browse files Browse the repository at this point in the history
Jared Weed committed Jul 28, 2016
1 parent 2b6a7fc commit ba7fa44
Showing 2 changed files with 9 additions and 9 deletions.
16 changes: 8 additions & 8 deletions projects/boston_housing/boston_housing.ipynb
Original file line number Diff line number Diff line change
@@ -23,10 +23,10 @@
"In this project, you will evaluate the performance and predictive power of a model that has been trained and tested on data collected from homes in suburbs of Boston, Massachusetts. A model trained on this data that is seen as a *good fit* could then be used to make certain predictions about a home — in particular, its monetary value. This model would prove to be invaluable for someone like a real estate agent who could make use of such information on a daily basis.\n",
"\n",
"The dataset for this project originates from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Housing). The Boston housing data was collected in 1978 and each of the 506 entries represent aggregated data about 14 features for homes from various suburbs in Boston, Massachusetts. For the purposes of this project, the following preprocessing steps have been made to the dataset:\n",
"- 16 data points have an `'MDEV'` value of 50.0. These data points likely contain **missing or censored values** and have been removed.\n",
"- 16 data points have an `'MEDV'` value of 50.0. These data points likely contain **missing or censored values** and have been removed.\n",
"- 1 data point has an `'RM'` value of 8.78. This data point can be considered an **outlier** and has been removed.\n",
"- The features `'RM'`, `'LSTAT'`, `'PTRATIO'`, and `'MDEV'` are essential. The remaining **non-relevant features** have been excluded.\n",
"- The feature `'MDEV'` has been **multiplicatively scaled** to account for 35 years of market inflation.\n",
"- The features `'RM'`, `'LSTAT'`, `'PTRATIO'`, and `'MEDV'` are essential. The remaining **non-relevant features** have been excluded.\n",
"- The feature `'MEDV'` has been **multiplicatively scaled** to account for 35 years of market inflation.\n",
"\n",
"Run the code cell below to load the Boston housing dataset, along with a few of the necessary Python libraries required for this project. You will know the dataset loaded successfully if the size of the dataset is reported."
]
@@ -50,8 +50,8 @@
"\n",
"# Load the Boston housing dataset\n",
"data = pd.read_csv('housing.csv')\n",
"prices = data['MDEV']\n",
"features = data.drop('MDEV', axis = 1)\n",
"prices = data['MEDV']\n",
"features = data.drop('MEDV', axis = 1)\n",
" \n",
"# Success\n",
"print \"Boston housing dataset has {} data points with {} variables each.\".format(*data.shape)"
@@ -64,7 +64,7 @@
"## Data Exploration\n",
"In this first section of this project, you will make a cursory investigation about the Boston housing data and provide your observations. Familiarizing yourself with the data through an explorative process is a fundamental practice to help you better understand and justify your results.\n",
"\n",
"Since the main goal of this project is to construct a working model which has the capability of predicting the value of houses, we will need to separate the dataset into **features** and the **target variable**. The **features**, `'RM'`, `'LSTAT'`, and `'PTRATIO'`, give us quantitative information about each data point. The **target variable**, `'MDEV'`, will be the variable we seek to predict. These are stored in `features` and `prices`, respectively."
"Since the main goal of this project is to construct a working model which has the capability of predicting the value of houses, we will need to separate the dataset into **features** and the **target variable**. The **features**, `'RM'`, `'LSTAT'`, and `'PTRATIO'`, give us quantitative information about each data point. The **target variable**, `'MEDV'`, will be the variable we seek to predict. These are stored in `features` and `prices`, respectively."
]
},
{
@@ -75,7 +75,7 @@
"For your very first coding implementation, you will calculate descriptive statistics about the Boston housing prices. Since `numpy` has already been imported for you, use this library to perform the necessary calculations. These statistics will be extremely important later on to analyze various prediction results from the constructed model.\n",
"\n",
"In the code cell below, you will need to implement the following:\n",
"- Calculate the minimum, maximum, mean, median, and standard deviation of `'MDEV'`, which is stored in `prices`.\n",
"- Calculate the minimum, maximum, mean, median, and standard deviation of `'MEDV'`, which is stored in `prices`.\n",
" - Store each calculation in their respective variable."
]
},
@@ -121,7 +121,7 @@
"- `'LSTAT'` is the percentage of homeowners in the neighborhood considered \"lower class\" (working poor).\n",
"- `'PTRATIO'` is the ratio of students to teachers in primary and secondary schools in the neighborhood.\n",
"\n",
"_Using your intuition, for each of the three features above, do you think that an increase in the value of that feature would lead to an **increase** in the value of `'MDEV'` or a **decrease** in the value of `'MDEV'`? Justify your answer for each._ \n",
"_Using your intuition, for each of the three features above, do you think that an increase in the value of that feature would lead to an **increase** in the value of `'MEDV'` or a **decrease** in the value of `'MEDV'`? Justify your answer for each._ \n",
"**Hint:** Would you expect a home that has an `'RM'` value of 6 be worth more or less than a home that has an `'RM'` value of 7?"
]
},
2 changes: 1 addition & 1 deletion projects/boston_housing/housing.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
RM,LSTAT,PTRATIO,MDEV
RM,LSTAT,PTRATIO,MEDV
6.575,4.98,15.3,504000.0
6.421,9.14,17.8,453600.0
7.185,4.03,17.8,728700.0

0 comments on commit ba7fa44

Please sign in to comment.