Skip to content

IlyaOrson/CyberDreamcatcher

Repository files navigation

Cyber Dreamcatcher

arXiv

This repository implements a Graph Attention Network (GATs) (same architecture as TacticAI) as a network-aware reinforcement learning policy for cyber defence. Our work extends the Cyber Operations Research Gym (CybORG) to represent network states as directed graphs with realistic, low-level features, enabling more realistic autonomous defence strategies.

Pica

Overview

Core Features

  • Topology-Aware Defence: Processes the complete network graph structure instead of simplified flat state observations
  • Runtime Adaptability: Handles dynamic changes in network topology as new connections appear
  • Cross-Network Generalisation: Trained policies can be deployed to networks of different sizes
  • Enhanced Interpretability: Defence actions can be explained through tangible network properties

What is included?

  • Custom CybORG environment with graph-based network state representation
  • GAT architecture modified for compatibility with policy gradient methods
  • Empirical evaluation for assessing policy generalisation vs. specialised training across varying network sizes

Note

This is a research project that serves as a proof-of-concept towards realistic network environments in cyber defence. Our implementation uses the low-level structure of the CybORG v2.1 simulator as a practical context, but the technique itself can be easily applied to other simulators with comparable complexity.

Setup

We used and recommend pixi to setup a reproducible project with predefined tasks.

Tip

If you would like to use other project management tool, the list of dependencies and installation tasks are available in pixi.toml. Untested environment files are provided for uv/pip (pyproject.toml) and for conda (conda_env.yml). Make sure to manually ignore the deps set by CybORG when installing it locally.

Clone this repo recursively to clone the CybORG v2.1 simulator and Cage 2 reference submissions as submodules.

git clone https://github.com/IlyaOrson/CyberDreamcatcher.git --recurse-submodules -j4

Install the dependencies of the project in a local environment.

cd CyberDreamcatcher
pixi install  # setup from pixi.toml file

Then install the submodules as local packages avoiding using pip to deal with dependencies.

# install environments from git submodules as a local packages
pixi run install-cyborg  # CybORG 2.1 + update to gymnasium API
pixi run install-cyborg-debugged  # or a debugged version from The Alan Turing Institute

# install troublesome dependencies without using pip to track their requirements
pixi run install-sb3  # stable baselines 3

Voila! An activated shell within this environment will have all dependencies working together.

pixi shell  # activate shell
python -m cyberdreamcatcher  # try out a single environment simulation

Functionality

We include predefined tasks that can be run to make sure everything is working:

pixi task list  # displays available tasks

pixi run test-cyborg  # run gymnasium-based cyborg tests

pixi run eval-cardiff  # cage 2 winner policy inference (simplified and flattened observation space)

Tip

Hydra is used to handle the inputs and outputs of every script. The available parameters for each task are accessible with the --help flag. The content generated per execution is stored in the outputs/ directory with subdirectories per timestamp of execution. The hyperparameters used in each run are registered in a hidden subfolder .hydra/ within the generated output folder. Tensorboard is used to track interesting metrics, just specify the correct hydra output folder as the logdir: tensorboard --logdir=outputs/...

Graph layout

Quickly visualise the graph layout setup in the cage 2 challenge scenario file, and the graph observations received by a random GAT policy.

pixi run plot-network scenario=Scenario2  # see --help for hyperparameters

Warning

This is the layout we expect from the simulator configuration, but CybORG has some variability in strictly enforcing this configuration during runtime.

Training

We include an implementation of the REINFORCE algorithm with a normalised rewards-to-go baseline. This is a bit slow since it samples a lot of episodes with a fixed policy to estimate the gradient before taking an optimisation step.

pixi run train-gnn-reinforce  # see --help for hyperparameters

Flat observation space + MLP + SB3-PPO

This trains an MLP policy using PPO from Stable Baselines 3. It relies on the less realistic and flattened observation space from CAGE 2, which cannot extrapolate to different network dimensions.

pixi run train-flat-sb3-ppo  # see --help for hyperparameters

Important

A direct performance comparison is not possible because the observation space are fundamentally different; where the flattened version is a higher level representation whereas the graph observation uses low-level information from the simulator.

Performance

It is possible (❗) to extrapolate the performance of a trained GAT policy under different network layouts.

Visualise reward to go at each timestep

Specify a scenario to sample episodes from and optionally the weights of a pretrained policy (potentially trained on a different scenario).

# The default behaviour is to use a random policy on "Scenario2".
pixi run plot-performance

# This will compare the performance of a trained policy
# with a random policy on the scenario used for training
pixi run plot-performance policy_weights="path/to/trained_params.pt"

joyplot

Generalisation to different networks

The objective is to compare the optimality gap trade-off between the extrapolation of a policy against a policy trained from scratch in each scenario. Specify the path to the trained policy to be tested and array of paths of the specialised policies to compare it to; the corresponding scenarios are loaded from the logged configuration.

# add --help to see the available options
pixi run plot-generalisation policy_weights=path/to/trained_params.pt local_policies=[path/to/0/trained_params.pt,path/to/1/trained_params.pt,path/to/3/trained_params.pt, ...]

generalisation