Merge branch 'mage-ai:master' into mcollin_prob

mage-ai · Jun 21, 2022 · 99b5e88 · 99b5e88
2 parents 49deb77 + f653326
commit 99b5e88
Show file tree

Hide file tree

Showing 23 changed files with 375 additions and 189 deletions.
diff --git a/README.md b/README.md
@@ -24,97 +24,54 @@ prepare it for training AI/ML models.
 > Join us on
 > **[<img alt="Slack" height="20" src="https://thepostsportsbar.com/wp-content/uploads/2017/02/Slack-Logo.png" style="position: relative; top: 4px;" /> Slack](https://www.mage.ai/chat)**
 
-### What does this do?
-The current version of Mage includes a data cleaning UI tool that can run locally on your laptop or
-can be hosted in your own cloud environment.
+**Table of contents**
 
-### Why should I use it?
-Using a data cleaning tool enables you to quickly visualize data quality issues,
-easily fix them, and create repeatable data cleaning pipelines that can be used in
-production environments (e.g. online re-training, inference, etc).
+1. [Quick start](#%EF%B8%8F-quick-start)
+1. [Features](#-features)
+1. [Roadmap](#%EF%B8%8F-roadmap)
+1. [Contributing](#%EF%B8%8F-contributing)
+1. [Community](#-community)
 
-# Table of contents
-1. [Quick start](#quick-start)
-1. [Features](#features)
-1. [Roadmap](#roadmap)
-1. [Contributing](#contributing)
-1. [Community](#community)
-
-# Quick start
+# 🏃‍♀️ Quick start
 
 - Try a **[demo of Mage](https://colab.research.google.com/drive/1Pc6dpAolwuSKuoOEpWSWgx6MbNraSMVE?usp=sharing)** in Google Colab.
 - Try a **[hosted version of Mage](http://18.237.55.91:5789/)**
 
 <img alt="Fire mage" height="160" src="media/mage-fire-charging-up.svg" />
 
-### Install library
-Install the most recent released version:
+### 1. Install Mage
 ```bash
 $ pip install mage-ai
 ```
 
-### Launch tool
-Load your data, connect it to Mage, and launch the tool locally.
-
-
-From anywhere you can execute Python code (e.g. terminal, Jupyter notebook, etc.),
-run the following:
-
+### 2. Load and connect data
 ```python
 import mage_ai
 from mage_ai.sample_datasets import load_dataset
 
 
 df = load_dataset('titanic_survival.csv')
 mage_ai.connect_data(df, name='titanic dataset')
-mage_ai.launch()
 ```
 
-Open [http://localhost:5789](http://localhost:5789) in your browser to access the tool locally.
-
-To stop the tool, run this command: `mage_ai.kill()`
-
-#### Custom host and port for tool
-
-If you want to change the default host (`localhost`) and the default port (`5789`)
-that the tool runs on, you can set 2 separate environment variables:
-
-```bash
-$ export HOST=127.0.0.1
-$ export PORT=1337
+### 3. Launch tool
+```python
+mage_ai.launch()
 ```
 
-#### Using tool in Jupyter notebook cell
-
-You can run the tool inside a Jupyter notebook cell iFrame using the method:
-`mage_ai.launch()` within a single cell.
-
-Optionally, you can use the following arguments to change the default host and
-port that the iFrame loads from:
+Open [http://localhost:5789](http://localhost:5789) in your browser to access the tool locally.
 
-```python
-mage_ai.launch(iframe_host='127.0.0.1', iframe_port=1337)
-```
+If you’re launching Mage in a notebook, the tool will render in an iFrame.
 
-### Cleaning data
+### 4. Clean data
 After building a data cleaning pipeline from the UI,
 you can clean your data anywhere you can execute Python code:
 
 ```python
-import mage_ai
-from mage_ai.sample_datasets import load_dataset
-
-
-df = load_dataset('titanic_survival.csv')
-
-# Option 1: Clean with pipeline uuid
-df_cleaned = mage_ai.clean(df, pipeline_uuid='uuid_of_cleaning_pipeline')
-
-# Option 2: Clean with pipeline config directory path
-df_cleaned = mage_ai.clean(df, pipeline_config_path='/path_to_pipeline_config_dir')
+mage_ai.clean(df, pipeline_uuid='pipeline name')
 ```
 
-### Demo video (2 min)
+## Demo video (2 min)
 
 [![Mage quick start demo](media/mage-demo-quick-start-youtube-preview.png)](https://www.youtube.com/watch?v=cRib1zOaqWs "Mage quick start demo")
 
@@ -127,14 +84,14 @@ Here is a [🗺️ step-by-step](docs/tutorials/quick-start.md) guide on how to
 
 Check out the [📚 tutorials](docs/tutorials/README.md) to quickly become a master of magic.
 
-# Features
+# 🔮 Features
 
-1. [Data visualizations](#data-visualizations)
-1. [Reports](#reports)
-1. [Cleaning actions](#cleaning-actions)
-1. [Data cleaning suggestions](#data-cleaning-suggestions)
+1. [Data visualizations](#1-data-visualizations)
+1. [Reports](#2-reports)
+1. [Cleaning actions](#3-cleaning-actions)
+1. [Data cleaning suggestions](#4-data-cleaning-suggestions)
 
-### Data visualizations
+### 1. Data visualizations
 Inspect your data using different charts (e.g. time series, bar chart, box plot, etc.).
 
 Here’s a list of available [charts](docs/charts/README.md).
@@ -146,7 +103,7 @@ Here’s a list of available [charts](docs/charts/README.md).
   />
 </kbd>
 
-### Reports
+### 2. Reports
 Quickly diagnose data quality issues with summary reports.
 
 Here’s a list of available [reports](docs/reports/README.md).
@@ -158,7 +115,7 @@ Here’s a list of available [reports](docs/reports/README.md).
   />
 </kbd>
 
-### Cleaning actions
+### 3. Cleaning actions
 Easily add common cleaning functions to your pipeline with a few clicks.
 Cleaning actions include imputing missing values, reformatting strings, removing duplicates,
 and many more.
@@ -175,7 +132,7 @@ Here’s a list of available [cleaning actions](docs/actions/README.md).
   />
 </kbd>
 
-### Data cleaning suggestions
+### 4. Data cleaning suggestions
 The tool will automatically suggest different ways to clean your data and improve quality metrics.
 
 Here’s a list of available [suggestions](docs/suggestions/README.md).
@@ -187,7 +144,7 @@ Here’s a list of available [suggestions](docs/suggestions/README.md).
   />
 </kbd>
 
-# Roadmap
+# 🗺️ Roadmap
 Big features being worked on or in the design phase.
 
 1. Encoding actions (e.g. one-hot encoding, label hasher, ordinal encoding, embeddings, etc.)
@@ -197,7 +154,7 @@ Big features being worked on or in the design phase.
 Here’s a detailed list of [🪲 features and bugs](https://airtable.com/shrwN5wDuDuPScPut/tblAlH31g7dYRjmoZ)
 that are in progress or upcoming.
 
-# Contributing
+# 🙋‍♀️ Contributing
 We welcome all contributions to Mage;
 from small UI enhancements to brand new cleaning actions.
 We love seeing community members level up and give people power-ups!
@@ -211,7 +168,7 @@ Got questions? Live chat with us in
 
 Anything you contribute, the Mage team and community will maintain. We’re in it together!
 
-# Community
+# 🧙 Community
 We love the community of Magers (`/ˈmājər/`);
 a group of mages who help each other realize their full potential!
 
@@ -225,7 +182,7 @@ For real-time news and fun memes, check out the Mage
 To report bugs or add your awesome code for others to enjoy,
 visit [GitHub](https://github.com/mage-ai/mage-ai).
 
-# License
+# 🪪 License
 See the [LICENSE](LICENSE) file for licensing information.
 
 <br />

diff --git a/docs/contributing/README.md b/docs/contributing/README.md
@@ -2,7 +2,7 @@
 
 ## Setting up development environment
 
-### Using Docker
+### 🏗️ Using Docker
 
 Run the below script to build the Docker image and run all the services:
 
@@ -45,10 +45,26 @@ $ docker attach [container_id]
 
 #### Example notebook
 
-Open the `example.ipynb` notebook for an interactive Python environment and connect your data
+Open the [example.ipynb](../../example.ipynb) notebook for an interactive Python environment and connect your data
 to the app.
 
-### Front-end UI
+##### Using tool in Jupyter notebook cell
+
+You can run the tool inside a Jupyter notebook cell iFrame using the method:
+`mage_ai.launch()` within a single cell.
+
+Optionally, you can use the following arguments to change the default host and
+port that the iFrame loads from:
+
+##### Kill tool
+
+*To stop the tool, run this command*: `mage_ai.kill()`
+
+```python
+mage_ai.launch(iframe_host='127.0.0.1', iframe_port=1337)
+```
+
+### 🖥️ Setting up the front-end UI
 
 #### Install Homebrew (if you haven't already)
 Directions at [brew.sh](https://brew.sh/).
@@ -108,7 +124,7 @@ $ yarn run dev
 
 Now visit [http://localhost:3000/datasets](http://localhost:3000/datasets) to view the tool.
 
-### Backend server
+### 🗄️ Setting up the backend server
 
 #### Install Python packages
 
@@ -162,7 +178,7 @@ sys.path.append('/absolute_path_to_repo/mage-ai')
 import mage_ai
 ```
 
-### Sample data
+### 💾 Sample data
 Load sample datasets to test and play with.
 
 ```python

diff --git a/docs/tutorials/clean.md b/docs/tutorials/clean.md
@@ -0,0 +1,19 @@
+# Clean
+
+### 3. Cleaning data
+After building a data cleaning pipeline from the UI,
+you can clean your data anywhere you can execute Python code:
+
+```python
+import mage_ai
+from mage_ai.sample_datasets import load_dataset
+
+
+df = load_dataset('titanic_survival.csv')
+
+# Option 1: Clean with pipeline uuid
+df_cleaned = mage_ai.clean(df, pipeline_uuid='uuid_of_cleaning_pipeline')
+
+# Option 2: Clean with pipeline config directory path
+df_cleaned = mage_ai.clean(df, pipeline_config_path='/path_to_pipeline_config_dir')
+```