Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1044] Added end-to-end tests at the end of Quickstart editorial #1118

Merged
merged 8 commits into from
May 14, 2024
Prev Previous commit
Next Next commit
[1044] adding assert statements to quickstart tutorial
  • Loading branch information
allincowell committed May 4, 2024
commit b732d44258a94fa5fe848846b29ab8eb65a19c0f
32 changes: 16 additions & 16 deletions docs/source/tutorials/datalab/datalab_quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,22 @@
"# Datalab: A unified audit to detect all kinds of issues in data and labels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cleanlab offers a `Datalab` object that can identify various issues in your machine learning datasets, such as noisy labels, outliers, (near) duplicates, drift, and other types of problems common in real-world data. These data issues may negatively impact models if not addressed. `Datalab` utilizes *any* ML model you have already trained for your data to diagnose these issues, it only requires access to either: (probabilistic) predictions from your model or its learned representations of the data.\n",
"\n",
"\n",
"**Overview of what we'll do in this tutorial:**\n",
"\n",
"- Compute out-of-sample predicted probabilities for a sample dataset using cross-validation.\n",
"- Use `Datalab` to identify issues such as noisy labels, outliers, (near) duplicates, and other types of problems \n",
"- View the issue summaries and other information about our sample dataset\n",
"\n",
"You can easily replace our demo dataset with your own image/text/tabular/audio/etc dataset, and then run the same code to discover what sort of issues lurk within it!"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -32,22 +48,6 @@
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cleanlab offers a `Datalab` object that can identify various issues in your machine learning datasets, such as noisy labels, outliers, (near) duplicates, drift, and other types of problems common in real-world data. These data issues may negatively impact models if not addressed. `Datalab` utilizes *any* ML model you have already trained for your data to diagnose these issues, it only requires access to either: (probabilistic) predictions from your model or its learned representations of the data.\n",
"\n",
"\n",
"**Overview of what we'll do in this tutorial:**\n",
"\n",
"- Compute out-of-sample predicted probabilities for a sample dataset using cross-validation.\n",
"- Use `Datalab` to identify issues such as noisy labels, outliers, (near) duplicates, and other types of problems \n",
"- View the issue summaries and other information about our sample dataset\n",
"\n",
"You can easily replace our demo dataset with your own image/text/tabular/audio/etc dataset, and then run the same code to discover what sort of issues lurk within it!"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Loading