ZenBytes is a series of practical lessons about MLOps through ZenML and its various integrations. It is intended for people looking to learn about MLOps generally, and also practitioners specifically looking to learn more about ZenML.
ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. The ZenML repository and Docs has more details.
ZenML is a good tool to learn MLOps because of two reasons:
🔹 ZenML focuses on being un-opinionated about underlying tooling and infrastructure across the MLOps stack. 🔹 ZenML presents itself as a pipeline tool, making all development in ZenML data-centric rather than model-centric.
The lessons are structured in Chapters. Each chapter is a notebook that walks through and explains various concepts:
- Chapter 0: Basics
- Chapter 1: Building a ML(Ops) pipeline
- Chapter 2: Transitioning across stacks
- Coming soon: More chapters
In order to run these lessons, you need to have some packages installed on your machine. Note you only need these for some parts, and you might get away
with only Python and pip install requirements.txt
for some parts of the codebase, but we recommend installing all these:
Currently, this will only run on UNIX systems.
package | MacOS installation | Linux installation |
---|---|---|
docker | Docker Desktop for Mac | Docker Engine for Linux |
kubectl | kubectl for mac | kubectl for linux |
k3d | Brew Installation of k3d | k3d installation linux |
You might also need to install Anaconda to get the MLflow deployment to work.
Once you've got the system requirements figured out, let's jump into the Python packages you need. Within the Python environment of your choice, run:
git clone https://github.com/zenml-io/zenbytes
pip install -r requirements.txt
If you are running the run.py
script, you will also need to install some integrations using zenml:
zenml integration install sklearn -f
zenml integration install dash -f
zenml integration install evidently -f
zenml integration install mlflow -f
zenml integration install kubeflow -f
zenml integration install seldon -f
We're ready to go now. You can go through the notebook step-by-step guide:
jupyter notebook
Once you are done running all notebooks you might want to stop all running processes. For this, run the following command.
(This will tear down your k3d
cluster and the local docker registry.)
zenml stack set aws_kubeflow_stack
zenml stack down -f
zenml stack set local_kubeflow_stack
zenml stack down -f
- MacOS When starting the container registry for Kubeflow, I get an error about port 5000 not being available.
OSError: [Errno 48] Address already in use
Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses port 5000 for the Airplay receiver. Here is a guide on how to fix this Freeing up port 5000.