Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
Gain intuition for different notions of similarity and practice finding similar documents.
Explore the tradeoffs with representing documents using raw word counts and TF-IDF
Explore the behavior of different distance metrics by looking at the Wikipedia pages most similar to President Obama’s page.
.zip
file is data file.
people_wiki.csv.zip (unzip people_wiki.csv
) consists of 59,071 pages and 3 features. URL name text
.json
files
people_wiki_map_index_to_word.json
.npz
files
people_wiki_tf_idf.npz
people_wiki_word_count.npz
description files
.ipynb
file is the solution of Week 2 program assignment 1
nearest-neighbors-features-and-metrics_blank.ipynb
.html
file is the html version of .ipynb
file.
nearest-neighbors-features-and-metrics_blank.html
.py
nearest-neighbors-features-and-metrics_blank.py
file
nearest-neighbors-features-and-metrics_blank
Recommend open md
file inside a file
open .html
file via brower for quick look.
K-NN with word count and tf-idf
metrics differ from Euclidean and Cosine
Extract word count vectors
Find nearest neighbors using word count vectors
Interpreting the nearest neighbors
Extract the TF-IDF vectors
Find nearest neighbors using TF-IDF vectors
Choosing metrics
Problem with cosine distances: tweets vs. long articles
You can’t perform that action at this time.