What is it about?

A strategy refers to the rules that the agent chooses the available actions to achieve goals. Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system’s utility, decrease the overall cost, and increase mission success probability. This research proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). By integrating the state-of-the-art deep reinforcement learning method – soft actor-critic (SAC) to build the corresponding Bayesian SAC, the AI agent can select suitable strategy combinations to achieve various tasks efficiently.

Featured Image

Why is it important?

In Artificial Intelligence (AI), a strategy describes the general plan of an AI agent achieving short-term or long-term goals under conditions of uncertainty, which involves setting sub-goals and priorities, determining action sequences to fulfill the tasks, and mobilizing resources to execute the actions. It exhibits the fundamental properties of agents’ perception, reasoning, planning, decision-making, learning, problem-solving, and communication in interaction with dynamic and complex environments. Especially in the field of real-time strategy (RTS) game and real-world implementation scenarios like robot-aided urban search and rescue (USAR) missions, agents need to dynamically change the strategies adapting to the current situations based on the environments and their expected utilities or needs. From a single-agent perspective, a strategy is a rule used by agents to select an action to pursue goals, which is equivalent to a policy in a Markov Decision Process (MDP). More specially, in reinforcement learning (RL), the policy dictates the actions that the agent takes as a function of its state and the environment, and the goal of the agent is to learn a policy maximizing the expected cumulative rewards in the process. With advancements in deep neural network implementations, deep reinforcement learning (DRL) helps AI agents master more complex strategy (policy) and represents a step toward building autonomous systems with a higher-level understanding of the visual world. Furthermore, in task-oriented decision-making, hierarchical reinforcement learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. Moreover, the hierarchy of policies collectively determines the agent's behavior by solving subtasks with low-level policy learning. However, a single strategy might involve learning several policies simultaneously, which means the strategy consists of several tactics (sub-strategies) or actions executing a simple task, especially in the robot locomotion and RTS game domain. To alleviate these issues, we introduce the Bayesian Strategy Network (BSN) to achieve efficient deep reinforcement learning (DRL). According to the principle of Bayesian networks, BSN decomposes an intricate strategy or joint action into several simple tactics or actions and organizes them as the knowledge graph. Then, based on the Soft Actor-Critic (SAC) algorithm, we propose a new DRL model termed the Bayesian Soft Actor-Critic (BSAC), which integrates the BSN and forms a joint policy better adapting the Q-value distribution. Generally, the key idea of the BSAC is that by building a suitable BSN to describe complicated relationships among subsystems in the complex system, it can transform a complex policy into a joint policy consisting of several simple sub-policies, which can improve the sample efficiency and training speed. Therefore, when we build the BSN for a system, we need to first clearly understand the system structure and clarify the relationships among different sub-systems. Then, we can build the corresponding BSN and joint policy to describe the system. The proposed approach can be applied in various scenarios of the real world, such as multi-agent/robot systems, robotics, self-driving cars, video games, etc.

Perspectives

It would be exciting to implement the BSAC on real robots and develop robust computation models for robotics and multi-agent systems, such as robot locomotion control, multi-robot planning and navigation, and robot-aided search and rescue missions. Moreover, it also relates to software architecture design learning from the specific hardware structure of the system, developing diverse skills to adapt to various scenarios for the whole system. From another perspective, the subsystems' policy training is like muscle memory training in human skills development.

Assistant Professor Qin Yang
Bradley University

Read the Original

This page is a summary of: Bayesian Strategy Networks Based Soft Actor-Critic Learning, ACM Transactions on Intelligent Systems and Technology, March 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3643862.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page