ZenBytes is a series of short practical MLOps lessons through ZenML and its various integrations. It is intended for people looking to learn about MLOps generally, and also for ML practitioners who want to get started with ZenML.
The release of ZenML 0.20.0 marks a big breaking change in ZenML history, and requires an equally big update to this repository. The ZenML team is working on this renovation as we speak, and will bring you a brand-new ZenBytes with the latest ZenML soon!
- Define an MLOps stack tailored to your project requirements.
- Build transparent and reproducible data-centric ML pipelines with automated artifact versioning, tracking, caching, and more.
- Deploy ML pipelines with tooling and infrastructure of your choice (e.g. as a serverless microservice in the cloud).
- Monitor and address production issues like training-serving skew and data drift.
- Use some of the most popular MLOps tools like ZenML, Kubeflow, MLflow, Weights & Biases, Evidently, Seldon, Feast, and many more.
In the end, you will be able to take any of your ML models from experimentation to a customized, fully fleshed-out production-grade MLOps setup in a matter of minutes!
The series is structured into four chapters with several lessons each. Click on any of the links below to open the respective lesson directly in Colab.
🍡 1. ML Pipelines | ♻️ 2. Training / Serving | 📁 3. Data Management | 🚀 4. Advanced Deployment |
---|---|---|---|
1.1 ML Pipelines | 2.1 Experiment Tracking | 3.1 Data Skew | 4.1 Cloud Deployment |
1.2 Artifact Lifecycle | 2.2 Local Deployment | ||
2.3 Inference Pipelines |
ZenML is an extensible, open-source MLOps framework for creating production-ready ML pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows.
If you enjoy these courses and want to learn more:
- Give the Main ZenML Repo a GitHub Star ⭐ to show your love!
- Join our Slack Community and become part of the ZenML family!
- Linux or MacOS
- Python 3.7 or 3.8
- Jupyter notebook and ZenML:
pip install zenml notebook
As you progress through the course, you will need to install additional packages for the various other MLOps tools we will use. You will find corresponding instructions in the respective notebooks, but we recommend you install all integrations ahead of time with the following command:
zenml integration install sklearn dash wandb evidently mlflow kubeflow seldon s3 aws -y
For the advanced deployment lessons in chapter 4, you will also need to have the following additional packages installed on your machine:
package | MacOS installation | Linux installation |
---|---|---|
docker | Docker Desktop for Mac | Docker Engine for Linux |
kubectl | kubectl for mac | kubectl for linux |
k3d | Brew Installation of k3d | k3d installation linux |
If you haven't done so already, clone ZenBytes to your local machine. Then, use Jupyter Notebook to go through the course lesson-by-lesson, starting with 1-1_Pipelines.ipynb
:
git clone https://github.com/zenml-io/zenbytes
cd zenbytes
jupyter notebook
Updating or switching your ZenML stack is sometimes not immediately loaded in Jupyter notebooks.
Solution: First, make sure you really have the correct component installed
and registered in your currently active stack with zenml stack describe
.
If the component is indeed there, restart the kernel of your Jupyter notebook,
which will also reload the stack.
2. MacOS When starting the container registry for Kubeflow, I get an error about port 5000 not being available.
OSError: [Errno 48] Address already in use
Solution: For Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses port 5000 for the Airplay receiver. Here is a guide on how to fix this Freeing up port 5000.