Tags: gsavarela/ilurl
Tags
Reduce train, evaluate and re-train cycle (#14) ISSUE: ------ Reduce the hyper parameter search cycle FIX: --- The longest task during training was the generation of the emissions file which always grows do the decimal of the GBs and took a long time to generate. The shift in scripts was always allow the info json (for aggregate performance metrics) and the pickle (for recovering the training models) to be generated by default. While enabling emissions' file generation only on evaluation.
Add flow and queue features to TLSAgent STATES: ------- States were described by discretization of two variables mean speeds and mean number of the vehicles trafficking over the intersection scope (both incoming and outgoing sections). Now two new variables have been added the mean throughput (flow) and queue as it appears such variables are more common in traffic light signal control literature. The flow is computed only on the outgoing sections (everything that's exiting must have entered at some point, while queues are computed only over incoming section edges (outgoing edges make the incoming edges of neighbours) In addition to two new variables the variables are now composable i.e before only mean speeds and mean number were allowed now we may run experiments with either flow or queue but with both together. ACTIONS: -------- The choice criteria which only counted with epsilon greedy now was expanded to incorporate the upper bound confidence interval. see section 2.7 of [1] REFERENCES ---------- [1] Sutton et Barto, Reinforcement Learning 2nd Ed 2018
PreviousNext