project2-push

sammcveety · Mar 28, 2016 · 4ed9f45 · 4ed9f45
1 parent 118528d
commit 4ed9f45
Showing 1 changed file with 32 additions and 27 deletions.
diff --git a/labs/project2/project2.ipynb b/labs/project2/project2.ipynb
@@ -16,7 +16,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
    "metadata": {
     "collapsed": false
    },
@@ -57,7 +57,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
    "metadata": {
     "collapsed": false
    },
@@ -79,7 +79,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
    "metadata": {
     "collapsed": false
    },
@@ -99,7 +99,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
    "metadata": {
     "collapsed": false
    },
@@ -111,7 +111,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 7,
    "metadata": {
     "collapsed": false
    },
@@ -129,7 +129,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 8,
    "metadata": {
     "collapsed": false
    },
@@ -165,7 +165,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 9,
    "metadata": {
     "collapsed": false,
     "scrolled": true
@@ -188,7 +188,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 10,
    "metadata": {
     "collapsed": false
    },
@@ -202,12 +202,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Question 1.1.2:** Assign `stemmed_message` to the stemmed version of the word \"message\"?"
+    "**Question 1.1.2:** Assign `stemmed_message` to the stemmed version of the word \"message\"."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 11,
    "metadata": {
     "collapsed": false
    },
@@ -221,7 +221,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 12,
    "metadata": {
     "collapsed": false
    },
@@ -239,7 +239,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 13,
    "metadata": {
     "collapsed": false
    },
@@ -253,7 +253,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 14,
    "metadata": {
     "collapsed": false
    },
@@ -271,20 +271,24 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 61,
+   "execution_count": 15,
    "metadata": {
     "collapsed": false
    },
    "outputs": [],
    "source": [
+    "# In our solution, we found it useful to first make an array\n",
+    "# called shortened containing the number of words that was\n",
+    "# chopped off of each word in vocab_table, but you don't have\n",
+    "# to do that.\n",
     "shortened = ...\n",
     "most_shortened = ...\n",
     "vocab_table.where('Word', most_shortened)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 16,
    "metadata": {
     "collapsed": false
    },
@@ -307,16 +311,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 17,
    "metadata": {
     "collapsed": false
    },
    "outputs": [],
    "source": [
     "# Here we have defined the proportion of our data\n",
     "# that we want to designate for training as 11/16ths\n",
-    "# of our total dataset, and the amount reserved for\n",
-    "# validation is 2/16ths. \n",
+    "# of our total dataset.  2/16ths of the data is\n",
+    "# reserved for validation.  The remaining 3/16ths\n",
+    "# will be used for testing.\n",
     "\n",
     "training_proportion = 11/16\n",
     "validation_proportion = 2/16\n",
@@ -344,7 +349,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 62,
+   "execution_count": 18,
    "metadata": {
     "collapsed": false
    },
@@ -376,13 +381,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 19,
    "metadata": {
     "collapsed": true
    },
    "outputs": [],
    "source": [
-    "# Just run this cell\n",
+    "# Just run this cell to define genre_color.\n",
     "\n",
     "def genre_color(genre):\n",
     "    \"\"\"Assign a color to each genre.\"\"\"\n",
@@ -415,7 +420,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 20,
    "metadata": {
     "collapsed": false
    },
@@ -497,7 +502,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Question 2.1.2.** Complete the function `distance` that computes the Euclidean distance between any two songs, using two features. The last two lines call the `distance` function  to show that *Lookin' for Love* is closer to *In Your Eyes* than *Insane In The Brain*. "
+    "**Question 2.1.2.** Complete the function `distance_two_features` that computes the Euclidean distance between any two songs, using two features. The last two lines call the `distance_two_features` function  to show that *Lookin' for Love* is closer to *In Your Eyes* than *Insane In The Brain*. "
    ]
   },
   {
@@ -543,7 +548,7 @@
    "source": [
     "**Question 2.1.3.** Define the higher-order function `distance_from` that takes a single song title and two features. It returns a function `for_song` that takes a second song title and computes the distance between the first and second songs.\n",
     "\n",
-    "*Hint: Call `distance` in your solution rather than re-implementing its computation.*"
+    "*Hint: Call `distance_two_features` in your solution rather than re-implementing its computation.*"
    ]
   },
   {
@@ -665,7 +670,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Question 3.1.** Write a function to compute the Euclidean distance between two *arrays* of features of *arbitrary* (but equal) length.  Use it to compute the distance between the first song in the training set and the first song in the test set, *using all of the features*."
+    "**Question 3.1.** Write a function to compute the Euclidean distance between two *arrays* of features of *arbitrary* (but equal) length.  Use it to compute the distance between the first song in the training set and the first song in the test set, *using all of the features*.  (Remember that the title, artist, and genre of the songs are not features.)"
    ]
   },
   {
@@ -851,7 +856,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Question 3.1.4.**  Now compute the 5-nearest neighbors classification of the first song in the test set.  That is, decide on its genre by finding the most common genre among its 5 nearest neighbors, according to the distances you've calculated.  Then check whether your classifier chose the right genre.  "
+    "**Question 3.1.4.**  Now compute the 5-nearest neighbors classification of the first song in the test set.  That is, decide on its genre by finding the most common genre among its 5 nearest neighbors, according to the distances you've calculated.  Then check whether your classifier chose the right genre.  (Depending on the features you chose, your classifier might not get this song right, and that's okay.)"
    ]
   },
   {
@@ -1214,7 +1219,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "An *ablation* study involves attempting to determine which features matter most for classification accuracy by removing each of them individually.\n",
+    "An *ablation* study involves attempting to determine which features matter most for classification accuracy by removing (\"ablating\") each of them individually.\n",
     "\n",
     "**Question 4.1.3.** Create a two-column table `ablation_accuracies` that shows the accuracy on the validation set of a 5-NN classifier that has all `staff_features` except one. Include a row for every feature in `staff_features` that you leave out. (*Hint*: Lists have a `.remove` method that takes the element to be removed.)"
    ]