PG is the most basic reinforcement learning algorithm that learns a policy by taking a gradient of action log probabilities and weighting them by the return. This algorithm is also known as REINFORCE.
conda create -n rllib-pg python=3.10
conda activate rllib-pg
pip install -r requirements.txt
pip install -e '.[development]'