Of course, faster training allows the firm to see a ROI on data and models quicker.
To expedite model development and updates, this repo contains Distributed (MapReduce) data preprocessing and training with XGBoost and Dask in Python.
In particular, this is a simple repo with a Jupyter notebook explaining how to use Dask, Zarr, and XGBoost together. It also comes with a Docker image running an extremely basic app that makes use of the model. While this docker image is available here, the Dockerfile and everything necessary to make the image are available in this repo.
Check it out: https://github.com/ChadGueli/bigboost/blob/main/bigboost.ipynb