SIGMOD PROJECT

Team members:

Andrea Veneziano
Georgios Fotiadis
Gerald Sula

Code information

You can find all of our code in the src/ folder. Following are the explanation of what each notebook does.

description_exploration.ipynb: See the 20 attributes that appear the most often per store, to select which ones we keep for cleaning.
cleaning_george/: Folder with notebooks that perform data cleaning for a subset of the stores and save the results to new csv files.
cleaning_gerald/: Folder with notebooks that perform data cleaning for a subset of the stores and save the results to new csv files.
data_cleaning_andrea: Notebook that performs data cleaning for a subset of the stores and saves the results to new csv files.
clustering_with_details.ipynb: This is our main notebook. Reads the cleaned data, performs some additional cleaning and standarization and calculates the matches. Saves the result to a csv.
clustering.ipynb: First naive approach (and one with highest score), we group by brand, keep only the title and clean it, and find similarities in each cluster (might not work now because data has changed/cleaned).
add_info_to_labeled.ipynb: Add all the information we had to the given labeled dataset for training.
ML_approach.ipynb: Our very unsuccessful machine learning approach.
alibaba_ebay_clustering.ipynb: Tried clustering per store but gave up because it was taking too long to run (>12 hours).
add_store_cluster_to_brands.ipynb: Merge the clusters per brand and clusters per store together and save the results to a csv.
one_last_ride.ipynb: Our very last effort to try a different approach and increase our score. We put all the cleaned attributes together in a list, group by all the subsets of the list of size two and mark as matches the elements that belong to the same cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
datasets		datasets
src		src
.gitignore		.gitignore
README.md		README.md
cleaning_guide.txt		cleaning_guide.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIGMOD PROJECT

Team members:

Code information

About

Releases

Packages

Contributors 3

Languages

geofot96/sigmod2020

Folders and files

Latest commit

History

Repository files navigation

SIGMOD PROJECT

Team members:

Code information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages