What is it about?
In this work, we look into the effects of altering the rewards given to a RL agent in a cybersecurity environment 'CybORG'. The task consists of a blue defensive agent learning to autonomously defend the network against a red attacking agent who tries to navigate through the system and disrupt the operational server. In an RL task, there are rewards given for moving from one state to the next. We explored changing these extrinsic environmental rewards in a variety of ways by increasing their magnitude in a variety of ways and by adding additional positive rewards (in an otherwise entirely penalty driven environment). We also considered the effects of adding an intrinsic curiosity module (ICM) to incentivise exploration of novel states.
Featured Image
Photo by JJ Ying on Unsplash
Why is it important?
There is limited literature available that focuses on altering the extrinsic and intrinsic rewards for an agent learning to autonomously defend a cybersecurity environment. These environments differ from those of typical RL tasks as the objective of the agent is to preserve the initial (i.e., non-compromised) state of the system/network (i.e., any deviation from that results in a negative reward). This work is an initial step in the direction of understanding how this characteristic affects learning and what techniques can be used to train a performant agent more efficiently.
Read the Original
This page is a summary of: Reward Shaping for Happier Autonomous Cyber Security Agents, November 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3605764.3623916.
You can read the full text:
Contributors
The following have contributed to this page







