Combining supervised and reinforcement learning to build a generic defensive cyber agent

Farooq, Muhammad Omer; Kunz, Thomas

Publication

Combining supervised and reinforcement learning to build a generic defensive cyber agent

Farooq, Muhammad Omer; Kunz, Thomas

Date

2025

Abstract

Abstract: Sophisticated mechanisms for attacking computer networks are emerging, making it crucial to have equally advanced mechanisms in place to defend against these malicious attacks. Autonomous cyber operations (ACOs) are considered a potential solution to provide timely defense. In ACOs, an agent that attacks the network is called a red agent, while an agent that defends against the red agent is called a blue agent. In real-world scenarios, different types of red agents can attack a network, requiring blue agents to defend against a variety of red agents, each with unique attack strategies and goals. This requires the training of blue agents capable of responding effectively, regardless of the specific strategy employed RED. Additionally, a generic blue agent must also be adaptable to different network topologies. This paper presents a framework for the training of a generic blue agent capable of defending against various red agents. The framework combines reinforcement learning (RL) and supervised learning. RL is used to train a blue agent against a specific red agent in a specific networking environment, resulting in multiple RL-trained blue agents—one for each red agent. Supervised learning is then used to train a generic blue agent using these RL-trained blue agents. Our results demonstrate that the proposed framework successfully trains a generic blue agent that can defend against different types of red agents across various network topologies. The framework demonstrates consistently improved performance over a range of existing methods, as validated through extensive empirical evaluation. Detailed comparisons highlight its robustness and generalization capabilities. Additionally, to enable generalization across different adversarial strategies, the framework employs a variational autoencoder (VAE) that learns compact latent representations of observations, allowing the blue agent to focus on high-level behavioral features rather than raw inputs. Our results demonstrate that incorporating a VAE into the proposed framework further improves its overall performance.

Publisher

MDPI

Citation

Journal of Cybersecurtiy and Privacy 5, 23

Type

Article

Rights

http://creativecommons.org/licenses/by-nc-sa/4.0/

Combining supervised and reinforcement learning to build a generic defensive cyber agent

Date

Abstract

Supervisor

Description

Publisher

Citation

Collections

Files

Keywords

ULRR Identifiers

Funding code

Funding Information

Sustainable Development Goals

External Link

Type

Rights

License