Loading...
Thumbnail Image
Publication

Adaptive cyber defense through hybrid learning: from specialization to generalization

Date
2025-10-09
Abstract
This paper introduces a hybrid learning framework that synergistically combines Reinforcement Learning (RL) and Supervised Learning (SL) to train autonomous cyber-defense agents capable of operating effectively in dynamic and adversarial environments. The proposed approach leverages RL for strategic exploration and policy development, while incorporating SL to distill high-reward trajectories into refined policy updates, enhancing sample efficiency, learning stability, and robustness. The framework first targets specialized agent training, where each agent is optimized against a specific adversarial behavior. Subsequently, it is extended to enable the training of a generalized agent that learns to counter multiple, diverse attack strategies through multi-task and curriculum learning techniques. Comprehensive experiments conducted in the CybORG simulation environment demonstrate that the hybrid RL–SL framework consistently outperforms pure RL baselines across both specialized and generalized settings, achieving higher cumulative rewards. Specifically, hybrid-trained agents achieve up to 23% higher cumulative rewards in specialized defense tasks and approximately 18% improvements in generalized defense scenarios compared to RL-only agents. Moreover, incorporating temporal context into the observation space yields a further 4–6% performance gain in policy robustness. Furthermore, we investigate the impact of augmenting the observation space with historical actions and rewards, revealing consistent, albeit incremental, gains in SL-based learning performance. Key contributions of this work include: (i) a novel hybrid learning paradigm that integrates RL and SL for effective cyber-defense policy learning, (ii) a scalable extension for training generalized agents across heterogeneous threat models, and (iii) empirical analysis on the role of temporal context in agent observability and decision-making. Collectively, the results highlight the promise of hybrid learning strategies for building intelligent, resilient, and adaptable cyber-defense systems in evolving threat landscapes.
Supervisor
Description
Publisher
MDPI
Citation
Future Internet 17(10), 464
Funding code
Funding Information
Sustainable Development Goals
External Link
Type
Article
Rights
http://creativecommons.org/licenses/by-nc-sa/4.0/
License