This package provides a debugging tool for Deep Reinforcement Learning (DRL) frameworks, designed to detect and address DNN and RL issues that may arise during training. The tool allows you to monitor your training process in real-time, identifying any potential flaws and making it easier to improve the performance of your DRL models.
The implementation is clean and simple, with research-friendly features. The highlight features of DRLDebugger are:
- 📜 Straightforward integration
- DRLDebugger can be integrated into your project with a few lines of code.
- 🗳️ DNN + RL checks
- 🛃 Custom checks
- 🖥️ Real-time warnings
- 📈 Monitoring using Weights and Biases
Prerequisites:
- Python >=3.7.7,<3.10 (not yet tested on 3.10)
Step 1. Install the debugger in your python environment:
git clone https://github.com/rached1997/RLDebugger.git && cd RLDebugger
pip install -e .
Step 2. Create a .yml config file and copy the following lines:
debugger:
name: 'Debugger'
kwargs:
observed_params:
constant: []
variable: []
check_type:
- name: #Checker_Name_1
- name: #Checker_Name_2
Step 3. Set up the config and run the debugger:
from debugger import rl_debugger
rl_debugger.set_config(config_path="the path to your debugger config.yml file")
...
rl_debugger.debug(model=...,
max_total_steps=...,
targets=...,
action_probs=....
)
For detailed steps on how to integrate the debugger, please refer to Integrating the Debugger
The checkers included in this pachage are divided into two categories: DNN-specific checkers and RL-specific checkers. The DNN-specific checkers were adapted from the paper "Testing Feedforward Neural Networks Training Programs" 1. We would like to thank this paper's authors for providing the code for these checks (https://github.com/thedeepchecker/thedeepchecker). These checks have been adapted to function in the DRL context and been migrated from TensorFlow 1 to PyTorch. The following Table list all the checkers with a link to their location in the package (you can find there more details on each checker).
Category | Check | Description |
---|---|---|
DNN Checks | Activation | Link to Description |
Loss | Link to Description | |
Weight | Link to Description | |
Bias | Link to Description | |
Gradient | Link to Description | |
ProperFitting | Link to Description | |
DRL Checks | Action | Link to Description |
Agent | Link to Description | |
Environment | Link to Description | |
ExplorationParameter | Link to Description | |
Reward | Link to Description | |
Step | Link to Description | |
State | Link to Description | |
UncertaintyAction | Link to Description | |
ValueFunction | Link to Description |
To integrate the tool you have to do the following 4 steps :
Create a .yml config file and copy the following lines :
debugger:
name: 'Debugger'
kwargs:
observed_params:
constant: []
variable: []
check_type:
- name: #Checker_Name_1
period: #Checker_period_value
skip_run_threshold: #Checker_skip_run_value
- name: #Checker_Name_2
The debugger config should have the same structure and includes the following elements:
- observed_params (only add the two lists when you will develop a new Checker) : contains the elements observed by the debugger. We have already a list of the default observed params (reward, actions, ...) so please only add non default params. The list of default params can be found here. Constant or variable indicates the nature of the observed parm.
- check_type : Mention the name of the check you want to perform, by replacing
#Checker_Name_1
and#Checker_Name_2
(you can add as many Checkers as you want). The names of the Checker can be found in the above table (Check Column).- period : You can specify the period over which the checker will be called each time. If you don't specify a specific period the default values found in the config data classes will be automatically used.
- skip_run_threshold : You can specify the number of skipped steps over which the checker will be called each time. If you don't specify a specific value the default values found in the config data classes will be automatically used.
Please note that this step is temporary, as the project is still in development.
- Clone the repository
- cd 'path to RLDebugger repo'
- Run the command
pip install -e .
- Import the debugger in your code with the following line:
from debugger import rl_debugger
- Set up the configuration using the following line :
rl_debugger.set_config(config_path="the path to your debugger config.yml file")
If you only need to deactivate a specific check for a particular Checker, this step is useful. You can modify the configuration of each Checker by changing the parameters in the config data class or by creating your own instance.
For instance, when debugging the weights during training (i.e Weight
Checker),
three different types of checks can be performed, but you may want to modify the thresholds or
deactivate a specific check. To do so, all you have to do is
modify the parameters you find in the WeightConfig
,
which you will find here, as shown below :
@dataclass
class WeightConfig:
start: int = 100
period: int = 10
skip_run_threshold: int = 2
numeric_ins: NumericIns = NumericIns()
neg: Neg = Neg()
dead: Dead = Dead()
div: Div = Div()
initial_weight: InitialWeight = InitialWeight()
To run the debugging, use the following code:
from debugger import rl_debugger
....
rl_debugger.debug(model=...,
max_total_steps=...,
targets=...,
action_probs=....
)
This function will run the Checkers chosen by the user in the config file. It can be called from any class or file in your project ( you can imagine it as a global function).
For example, to run the Checker Action
, the debug
function should receive
three parameters: actions_probs, max_total_steps, and max_reward. Moreover,The function
debug
can be called in different parts of the code and be provided with
the parameters available at that point, but the Checker will wait until all the
parameters are received, so it can start running.
It is important to note that the environment is the only parameter required for the debugger to properly operate. Also, the environment needs to be passed at the beginning of your code and in the first call of ".debug()".
from debugger import rl_debugger
....
env = gym.make("CartPole-v1")
rl_debugger.debug(environment=env)
It is also important to note that while calling debug
, the key for the parameters
(args) must match exactly as mentioned in the configuration file under observed_params
. If there
are parameters that are not required (i.e., not used by any of the Checkers),
they can be omitted.
When you run the training, the debugging process will generate warning messages indicating any errors that have occurred and the elements that caused them. To help you better understand the results of the debugging process, it's important to carefully review these messages and take the necessary actions to resolve any issues that they highlight.
To help you better understand the results of the debugging process, we added visualization
options for some checkers (Action
and Reward
checkers) using Wandb
. You can follow the
same logic in your custom checkers if you want to add visualization to them. Please explore
here for more details.
To create a new checker, you can follow the structure outlined in the code snippet below:
from debugger import DebuggerInterface
from dataclasses import dataclass
@dataclass
class CustomCheckerConfig:
period: int = ...
other_data: float = ...
class CustomChecker(DebuggerInterface):
def __init__(self):
super().__init__(check_type="CustomChecker", config=CustomCheckerConfig)
# you can add other attributes
# You can define other functions
def run(self, observed_param):
if self.check_period():
# Do some instructions ....
self.error_msg.append("your error message")
To create a new Checker, you need to include the following elements in your class:
-
CustomCheckerConfig
data class: This class is mandatory and defines all the configurations necessary for running your custom Checker. It is necessary to include theperiod
element, which determines the periodicity of the debugging (if you want the Checker to run only before the training, set its value to 0). -
Your Checker class: Your Checker should inherit from the
DebuggerInterface
class and initialize itself by callingsuper()
and providing the name of your Checker of your Checker -
The
run
function: The function should include three important elements: the parameters you need (as mentioned in theobserved_params
in the config.yml file), the periodicity check using the predefinedcheck_period()
function, and appending the messages you want to display to theself.error_msg list.
Notes:
- If you need to know the number of times the Checker has run, you can check
it using the
self.iter_num
variable. - If you need to know the number of steps the RL algorithm has performed, you can check
it using the
self.step_num
variable.
To run your new Checker, you need to register it by adding the following line
in your main.py
:
rl_debugger.register(checker_name="CustomChecker", checker_class=CustomChecker)
# the register method should be called before the set_config method
rl_debugger.set_config(...)
Finally, to run your Checker, all you have to do is add "CustomChecker"
to the config.yml
file and run the training.
- The environment is the only parameter required for the debugger to properly operate. Also, the environment needs to be passed at the beginning of your code and in the first call of ".debug()".
from debugger import rl_debugger
....
env = gym.make("CartPole-v1")
rl_debugger.debug(environment=env)
- Every observed parameters (e.g., model, target_model, action_probs, etc) need to be send once to the debugger through the ".debug()". The following code snippet is a wrong behavior.
from debugger import rl_debugger
....
state, reward, done, _ = env.step(action)
qvals = qnet(state)
rl_debugger.debug(model=qnet)
...
batch = replay_buffer.sample(batch_size=32)
qvals = qnet(batch["state"])
rl_debugger.debug(model=qnet)
The above code would result in a wrong behavior as the model is sent twice to the debugger. You should avoid send the same observed parameters from two different code locations.
- If you have a test run during the learning process, you have to turn off/on the debugger. Otherwise, some unexpected behavior may arise.
from debugger import rl_debugger
....
def run_testing():
rl_debugger.turn_off()
results = super().run_testing()
rl_debugger.turn_on()
return results
- It is recommended to add all the constant observed params (e.g., max_reward, loss_fn, max_total_steps, ...) at the beginning of your code and in the first call of ".debug()". These params needs to be observed once and many checkers rely on them. Thus, providing them once and in at the beginning of your code would be more efficient.
from debugger import rl_debugger
....
env = gym.make("CartPole-v1")
rl_debugger.debug(
environment=env,
max_reward=reward_threshold,
max_steps_per_episode=max_steps_per_episode,
max_total_steps=max_train_steps,
)
Feel free to ask questions. Posting in Github Issues and PRs are also welcome.