These examples demonstrate how to train reinforcement learning models on SageMaker for a wide range of applications.
IMPORTANT for rllib users: Some examples may break with latest rllib due to breaking API changes. Please refer to Amazon SageMaker RL Container for the latest public images and modify the configs in entrypoint scripts according to rllib algorithm config.
If you are using PyTorch rather than TensorFlow, please set debugger_hook_config=False
when calling RLEstimator()
to avoid TensorBoard conflicts.
- Contextual Bandit with Live Environment illustrates how you can manage your own contextual multi-armed bandit workflow on SageMaker using the built-in Vowpal Wabbit (VW) container to train and deploy contextual bandit models.
- Cartpole uses SageMaker RL base docker image to balance a broom upright.
- Cartpole Batch uses batch RL techniques to train Cartpole with offline data.
- Cartpole Spot Training uses SageMaker Managed Spot instances at a lower cost.
- DeepRacer gives a glimse of architecture used to get the DeepRacer working with AWS RoboMaker.
- HVAC optimizes energy use based on the EnergyPlus simulator.
- Knapsack is an example of using RL to address operations research problem.
- Mountain Car is a classic control RL problem, in which an under-powered car is tasked with climbing a steep mountain, and is only successful when it reaches the top.
- Network Compression reduces the size of a trained network using a RL algorithm.
- Portfolio Management shows how to re-distribute a capital into a set of different financial assets using RL algorithms.
- Predictive Auto-scaling scales a production service via RL approach by adding and removing resources in reaction to dynamically changing load.
- Resource Allocation solves three canonical online and stochastic decision making problems using RL algorithms.
- Roboschool Ray demonstrates how to use Ray to scale RL training in different ways, and how to leverage SageMaker's Automatic Model Tuning functionality to optimize the training of an RL model.
- Roboschool Stable Baseline is an example of using stable-baselines to train RL algorithms.
- Tic-tac-toe uses RL to train a policy and then plays locally and interactively within the notebook.
- Traveling Salesman and Vehicle Routing is an example of using RL to address operations research problems.
- Game Server Auto-pilot Reduce player wait time by autoscaling game-servers deployed in EKS cluster using RL to add and remove EC2 instances as per dynamic player usage.
- Unity Game Agent shows how to use RL algorithms to train an agent to play Unity3D game.