Skip to content

Codifications used in the IEEE Transactions on Cybernetics paper: "MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning"

Notifications You must be signed in to change notification settings

f-leno/DOO-Q_extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning

This is the codification used in the IEEE Transactions on Cybernetics paper proposing the Multiagent Object-Oriented approach. You are free to use all or part of the codes here presented for any purpose, provided that the paper is properly cited and the original authors properly credited. All the files here shared come with no warranties.

Paper bib entry:

@ARTICLE{Silvaetal2018,
author = {Silva, Felipe Leno Da and
Ruben Glatt and
Anna Helena Reali Costa},
title = {{MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning}},
journal = {IEEE Transactions on Cybernetics},
year={2017},
volume={PP},
number={99},
pages={1-13},
doi={10.1109/TCYB.2017.2781130},
ISSN={2168-2267}
}


Part of this project was built on BURLAP2 (http://burlap.cs.brown.edu/). I included the used Burlap version to avoid incompatibility issues, changes in the codification will be necessary if you use a newer BURLAP version.

This work is an extension of a conference paper (http://ieeexplore.ieee.org/document/7839556/). The journal version and the new codification should be cited/used for most of the purposes.

Files

The folder goldmine_and_gridworld contains the Java implementation (as an Eclipse project) and the BURLAP source files for the Goldmine and Gridworld domains.

The folder prey_predator contains the Python implementation for the Predator-Prey domain.

The folder experiment_results contains the .csv files which contain the results for our experiments and the MATLAB implementation to read the .csv files and output graphs.

Quickly Replicating Results

The commands mentioned below should be executed on MATLAB and generate the graphs shown in the paper. Some manual modifications in the style were performed to increase readability. The required .m files are inside the experiment_results folder.

Gridworld and Goldmine:

folderCSV = %<path for goldmine or gridworld folders>
initTrial = 1;
endTrial = 70; % 50 for gridworld
useMarkers = true;
generateGraphFromBurlapFile(folderCSV, initTrial, endTrial,useMarkers);

Predator-Prey

folderOriginal = %<path for experiment_results folder>
folderCSV = [folderOriginal,'/prey-predator/'];
repetitions = 250;
initTrial = 1;
endTrial = repetitions; 
useMarkers = false;
convert_preyPredator(folderCSV,repetitions,3); %may take a long time to run
generateGraphFromBurlapFile(folderCSV, initTrial, endTrial,useMarkers);

How to use

The folder goldmine_and_gridworld stores the implementations for the Goldmine and Gridworld domains.

We used Eclipse to run the experiments, hence you can import the folder as a project in Eclipse or import all files (including the burlap jar in the lib folder as an library) in your preferred IDE.

The experiments of our paper are replicated by executing the main method in the ExperimentBRACIS2016 class (I recommend executing the VM with the parameters -Xms1024m -Xmx14024m).

After executing this method, .csv files will be generated with the experiments results, that can be used to print graphs on matlab by executing the file generateGraphFromBurlapFile.m.

For executing experiments in the Predator-Prey domain, the experiment.py file must be executed. We used the PyCharm IDE with an Anaconda environment.

The python codification outputs a .csv file in a different format, hence the convert_preyPredator.m script adapts the output to the correct format.

We advise you to implement your own script to generate graphs, as the matlab file is not very well commented.

Attention

Our DOO-Q and DQL implementations are highly optimized to execute experiments faster, which means that the memory consumption is huge. If you want to use it to applications or in a pc with low memory resources, you will need to change our implementation.

A huge amount of memory can be saved if the implementation of DOOQPolicy is changed to only store entries on policyMemory in case there is two Q-values tied as the best action. However, if you do so, the experiments will run slower.

Contact

For any question, please send an email to the first author.

About

Codifications used in the IEEE Transactions on Cybernetics paper: "MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning"

Resources

Stars

Watchers

Forks

Packages

No packages published