Skip to content

Commit

Permalink
Merge branch 'mage-ai:master' into mcollin_prob
Browse files Browse the repository at this point in the history
  • Loading branch information
ChadGueli authored Jun 21, 2022
2 parents 49deb77 + f653326 commit 99b5e88
Show file tree
Hide file tree
Showing 23 changed files with 375 additions and 189 deletions.
103 changes: 30 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,97 +24,54 @@ prepare it for training AI/ML models.
> Join us on
> **[<img alt="Slack" height="20" src="https://thepostsportsbar.com/wp-content/uploads/2017/02/Slack-Logo.png" style="position: relative; top: 4px;" /> Slack](https://www.mage.ai/chat)**
### What does this do?
The current version of Mage includes a data cleaning UI tool that can run locally on your laptop or
can be hosted in your own cloud environment.
**Table of contents**

### Why should I use it?
Using a data cleaning tool enables you to quickly visualize data quality issues,
easily fix them, and create repeatable data cleaning pipelines that can be used in
production environments (e.g. online re-training, inference, etc).
1. [Quick start](#%EF%B8%8F-quick-start)
1. [Features](#-features)
1. [Roadmap](#%EF%B8%8F-roadmap)
1. [Contributing](#%EF%B8%8F-contributing)
1. [Community](#-community)

# Table of contents
1. [Quick start](#quick-start)
1. [Features](#features)
1. [Roadmap](#roadmap)
1. [Contributing](#contributing)
1. [Community](#community)

# Quick start
# 🏃‍♀️ Quick start

- Try a **[demo of Mage](https://colab.research.google.com/drive/1Pc6dpAolwuSKuoOEpWSWgx6MbNraSMVE?usp=sharing)** in Google Colab.
- Try a **[hosted version of Mage](http://18.237.55.91:5789/)**

<img alt="Fire mage" height="160" src="media/mage-fire-charging-up.svg" />

### Install library
Install the most recent released version:
### 1. Install Mage
```bash
$ pip install mage-ai
```

### Launch tool
Load your data, connect it to Mage, and launch the tool locally.


From anywhere you can execute Python code (e.g. terminal, Jupyter notebook, etc.),
run the following:

### 2. Load and connect data
```python
import mage_ai
from mage_ai.sample_datasets import load_dataset


df = load_dataset('titanic_survival.csv')
mage_ai.connect_data(df, name='titanic dataset')
mage_ai.launch()
```

Open [http://localhost:5789](http://localhost:5789) in your browser to access the tool locally.

To stop the tool, run this command: `mage_ai.kill()`

#### Custom host and port for tool

If you want to change the default host (`localhost`) and the default port (`5789`)
that the tool runs on, you can set 2 separate environment variables:

```bash
$ export HOST=127.0.0.1
$ export PORT=1337
### 3. Launch tool
```python
mage_ai.launch()
```

#### Using tool in Jupyter notebook cell

You can run the tool inside a Jupyter notebook cell iFrame using the method:
`mage_ai.launch()` within a single cell.

Optionally, you can use the following arguments to change the default host and
port that the iFrame loads from:
Open [http://localhost:5789](http://localhost:5789) in your browser to access the tool locally.

```python
mage_ai.launch(iframe_host='127.0.0.1', iframe_port=1337)
```
If you’re launching Mage in a notebook, the tool will render in an iFrame.

### Cleaning data
### 4. Clean data
After building a data cleaning pipeline from the UI,
you can clean your data anywhere you can execute Python code:

```python
import mage_ai
from mage_ai.sample_datasets import load_dataset


df = load_dataset('titanic_survival.csv')

# Option 1: Clean with pipeline uuid
df_cleaned = mage_ai.clean(df, pipeline_uuid='uuid_of_cleaning_pipeline')

# Option 2: Clean with pipeline config directory path
df_cleaned = mage_ai.clean(df, pipeline_config_path='/path_to_pipeline_config_dir')
mage_ai.clean(df, pipeline_uuid='pipeline name')
```

### Demo video (2 min)
## Demo video (2 min)

[![Mage quick start demo](media/mage-demo-quick-start-youtube-preview.png)](https://www.youtube.com/watch?v=cRib1zOaqWs "Mage quick start demo")

Expand All @@ -127,14 +84,14 @@ Here is a [🗺️ step-by-step](docs/tutorials/quick-start.md) guide on how to

Check out the [📚 tutorials](docs/tutorials/README.md) to quickly become a master of magic.

# Features
# 🔮 Features

1. [Data visualizations](#data-visualizations)
1. [Reports](#reports)
1. [Cleaning actions](#cleaning-actions)
1. [Data cleaning suggestions](#data-cleaning-suggestions)
1. [Data visualizations](#1-data-visualizations)
1. [Reports](#2-reports)
1. [Cleaning actions](#3-cleaning-actions)
1. [Data cleaning suggestions](#4-data-cleaning-suggestions)

### Data visualizations
### 1. Data visualizations
Inspect your data using different charts (e.g. time series, bar chart, box plot, etc.).

Here’s a list of available [charts](docs/charts/README.md).
Expand All @@ -146,7 +103,7 @@ Here’s a list of available [charts](docs/charts/README.md).
/>
</kbd>

### Reports
### 2. Reports
Quickly diagnose data quality issues with summary reports.

Here’s a list of available [reports](docs/reports/README.md).
Expand All @@ -158,7 +115,7 @@ Here’s a list of available [reports](docs/reports/README.md).
/>
</kbd>

### Cleaning actions
### 3. Cleaning actions
Easily add common cleaning functions to your pipeline with a few clicks.
Cleaning actions include imputing missing values, reformatting strings, removing duplicates,
and many more.
Expand All @@ -175,7 +132,7 @@ Here’s a list of available [cleaning actions](docs/actions/README.md).
/>
</kbd>

### Data cleaning suggestions
### 4. Data cleaning suggestions
The tool will automatically suggest different ways to clean your data and improve quality metrics.

Here’s a list of available [suggestions](docs/suggestions/README.md).
Expand All @@ -187,7 +144,7 @@ Here’s a list of available [suggestions](docs/suggestions/README.md).
/>
</kbd>

# Roadmap
# 🗺️ Roadmap
Big features being worked on or in the design phase.

1. Encoding actions (e.g. one-hot encoding, label hasher, ordinal encoding, embeddings, etc.)
Expand All @@ -197,7 +154,7 @@ Big features being worked on or in the design phase.
Here’s a detailed list of [🪲 features and bugs](https://airtable.com/shrwN5wDuDuPScPut/tblAlH31g7dYRjmoZ)
that are in progress or upcoming.

# Contributing
# 🙋‍♀️ Contributing
We welcome all contributions to Mage;
from small UI enhancements to brand new cleaning actions.
We love seeing community members level up and give people power-ups!
Expand All @@ -211,7 +168,7 @@ Got questions? Live chat with us in

Anything you contribute, the Mage team and community will maintain. We’re in it together!

# Community
# 🧙 Community
We love the community of Magers (`/ˈmājər/`);
a group of mages who help each other realize their full potential!

Expand All @@ -225,7 +182,7 @@ For real-time news and fun memes, check out the Mage
To report bugs or add your awesome code for others to enjoy,
visit [GitHub](https://github.com/mage-ai/mage-ai).

# License
# 🪪 License
See the [LICENSE](LICENSE) file for licensing information.

<br />
Expand Down
26 changes: 21 additions & 5 deletions docs/contributing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Setting up development environment

### Using Docker
### 🏗️ Using Docker

Run the below script to build the Docker image and run all the services:

Expand Down Expand Up @@ -45,10 +45,26 @@ $ docker attach [container_id]

#### Example notebook

Open the `example.ipynb` notebook for an interactive Python environment and connect your data
Open the [example.ipynb](../../example.ipynb) notebook for an interactive Python environment and connect your data
to the app.

### Front-end UI
##### Using tool in Jupyter notebook cell

You can run the tool inside a Jupyter notebook cell iFrame using the method:
`mage_ai.launch()` within a single cell.

Optionally, you can use the following arguments to change the default host and
port that the iFrame loads from:

##### Kill tool

*To stop the tool, run this command*: `mage_ai.kill()`

```python
mage_ai.launch(iframe_host='127.0.0.1', iframe_port=1337)
```

### 🖥️ Setting up the front-end UI

#### Install Homebrew (if you haven't already)
Directions at [brew.sh](https://brew.sh/).
Expand Down Expand Up @@ -108,7 +124,7 @@ $ yarn run dev

Now visit [http://localhost:3000/datasets](http://localhost:3000/datasets) to view the tool.

### Backend server
### 🗄️ Setting up the backend server

#### Install Python packages

Expand Down Expand Up @@ -162,7 +178,7 @@ sys.path.append('/absolute_path_to_repo/mage-ai')
import mage_ai
```

### Sample data
### 💾 Sample data
Load sample datasets to test and play with.

```python
Expand Down
19 changes: 19 additions & 0 deletions docs/tutorials/clean.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Clean

### 3. Cleaning data
After building a data cleaning pipeline from the UI,
you can clean your data anywhere you can execute Python code:

```python
import mage_ai
from mage_ai.sample_datasets import load_dataset


df = load_dataset('titanic_survival.csv')

# Option 1: Clean with pipeline uuid
df_cleaned = mage_ai.clean(df, pipeline_uuid='uuid_of_cleaning_pipeline')

# Option 2: Clean with pipeline config directory path
df_cleaned = mage_ai.clean(df, pipeline_config_path='/path_to_pipeline_config_dir')
```
Loading

0 comments on commit 99b5e88

Please sign in to comment.