Supplementary Code for "A Scalable Solver for 2p0s Differential Games with One-Sided Payoff Information and Continuous Actions, States, and Time".
-
setup the conda environment using the file
env.yml
-
Navigate to
our_method/visualization_scipts/
to use existing trained models to generate trajectories:-
simulation_latest_for_gt_comparison
simulates the trajectories for the 4-stage game -
simulation_latest.py
simulates the unconstrained case -
simulation_latest_primal_dual.py
simulates the unconstrained case with both primal and dual policies -
simulation_latest_cons.py
simulates the constrained case -
simulation_latest_cons_primal_dual.py
simulates the constrained case with both primal and dual policies
-
-
Navigate to
our_method/
to train the value network for different cases -- unconstrained, contrained and their dual versions and the 3d case.- run
./train_our_method.sh
to train the primal unconstrained case - run
./train_our_method_for_cfr.sh
to train the comparison case (against DeepCFR) - run
./train_our_method_dual.sh
to train the dual unconstrained case - run
./train_our_method_cons.sh
to train the primal constrained case - run
./train_our_method_cons_dual.sh
to train the dual constrained case - run
./train_our_method_3d.sh
to train the primal high dimensional case
- run
-
To train deep cfr policy networks, run
run_cfr_3.py
for$|A|=9$ , andrun_cfr
for$|A|=16$ . -
To compare our method with cfr, run the notebook
our_method/hexner_last_step-stopping.ipynb
-
To generate trajectories using deepcfr, run the notebook
DeepCFR_Trajectory.ipynb