updating readme.md

ahmedbesbes · Apr 3, 2021 · 7a9a8b0 · 7a9a8b0
1 parent 93a5a17
commit 7a9a8b0
Showing 1 changed file with 96 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1 +1,96 @@
-# playground
+### Playgroud
+
+Playground is a **streamlit** application that allows you to tinker with machine learning models from your browser.
+
+_This app is inspired by the great Tensorflow [playground](https://playground.tensorflow.org/). The only difference here is that it addresses classical machine learning models_
+
+![](./images/playground.png)
+
+### Demo
+
+Right [here](https://playground-ml.herokuapp.com/)
+
+### How does it work ?
+
+1. 🗂️ You pick and configure a **dataset** from a pre-defined list. You can set:
+   - the number of samples
+   - the noise on train and test data
+2. ⚙️ You select a **model** set its hyper-parameters. You can pick a model from: Logistic regression, decision tree, random forests, gradient boosting, neural networks, Naive Bayes, KNNs and SVM
+3. 📉 The app automatically displays the following results:
+   - the decision boundary of the model on train and test data
+   - the performance metrics (Accuracy and F1 score) on train and test data
+   - the time it took the model to train
+   - a generated python script to reproduce the model based on the dataset definition and the model hyper-parameters
+4. For each model, playground provides a link to the official documentation as well as a list of tips.
+
+**Bonus point**: the app also provides the ability to perform feature engineering by adding polynomial features. This proves to be helpful for linear models such as logistic regressions on non-linear problems.
+
+### What can you learn from playground?
+
+If you're new to machine learning, playing with this app will probably (and hopefully :)) get you familiar with basic notions and help you build your first intuitions. It won't replace text books: it's only meant to complement your knowledge. Take it as it is.
+
+#### 1. Decision boundaries will (partially) tell you how models behave
+
+You'll get more sense of how each model works by inspecting its decision boundary. For educational purposes, playground will process datasets that have 2 feautres (but same results can be obtained multi-dimensional datasets after dimensionality reduction)
+
+You'll see for example that a logistic regression separates the data by a line (or a hyperplane in the general case)
+
+![](./images/logistic_regression.png)
+
+whereas a decision tree, who classifies the data based on successive conditionals on the values of the features, has a decision boundary composed of horizontal and vertical lines.
+
+![](./images/decision_tree.png)
+
+Interestingly, a random forest, which is a bagging of multiple decision trees, has a decision boundary that looks similar to the decision tree's but only **smoother**: this is result of the voting mechanism a random forest uses.
+
+![](./images/random_forest.png)
+
+#### 2. You'll get a sense of the speed of each model
+
+Given the same dataset, you can compare the speed of each model and get a feeling of who's faster. In the previous plots, the logistic regression and the decision tree respectively took **0.004** and **0.001** seconds to train whereas the random foest took **0.154 seconds**.
+
+Try a Neural Network with 4 stacked layers of 100 neurons each: it takes **0.253 seconds**
+
+![](./images/neural_network.png)
+
+#### 3. Feature engineering can help
+
+Using a logistic regression on the moon dataset won't get you a good score given its non-linear nature.
+
+However, increasing the dimensionality by adding polynomial features can help: try increasing the polynomial degree to 3 when using a logistic regression and notice how the decision boundary radically changes.
+
+![](./images/neural_network.png)
+
+#### 4. Some models are more robust than others to noise
+
+You can experiment by setting a higher noise on the test data, thus making it drift from the train distribution. Some models such as Gradient Boosting are more stable than others against this problem.
+
+#### 5. Try out different combinations of hyper-parameters
+
+A great way to learn and validate your intuitions is to experiment, and that's what this app is for: it'll allow you to tinker with a bunch of hyper-parameters (tree depth, number of estimators, number of layers etc...) and immediately see the results on the decision boundaries, the metrics as well as the execution time.
+
+Go, and give it a try, and I hope you'll learn something from it!
+
+### Run the app locally
+
+Make sure you have pip installed with Python 3.
+
+- install pipenv
+
+```shell
+pip install pipenv
+```
+
+- go inside the folder and install the dependencies
+
+```shell
+pipenv install
+```
+
+- run the app
+
+```shell
+streamlit run app.py
+```
+
+#### Structure of the code