The following packages and data are required in order to reproduce the visualisations and results show in this project:
- Redefining Cancer Treatment - Kaggle Competition Datasets
- Python 3
- os
- tqdm
- string
- pandas
- numpy
- NLKT
- WordNetLemmatizer
- word_tokenzie
- re
- random
- collections
- Tensorflow
- Keras
- KerasClassifier
- Sequential
- Layers
- Dense, Dropout, LTSM, Embedding, Input, RepeatVector
- Utils
- Preprocessing
- Text
- Sequence
- SKLearn
- LabelEncoder
- TruncatedSVD
- TfidVectorizer
- Gensim
- Utils
- Doc2Vec
- LabeledSentence
- Seaborn
- Matplotlib
The relevant files included in the submission are:
Predicting Cancer Project.ipynb: Data manipulation and model creation
EDA.ipynb: Exploratory Data Analysis
Capstone Project Report.pdf: Project Report
Details of the Kaggle competition can be found at: