What is it about?

Reinforcement learning (RL) encounters many obstacles when deploying an agent to real-world tasks, e.g., autonomous driving, robot control, and healthcare, due to costly online trial-and-error. Fortunately, it is available for these tasks to pre-collect large and diverse datasets. Hence, the research on learning high-quality policies from static datasets has promoted the development of offline RL.

Featured Image

Why is it important?

(1) We propose a bi-objective policy optimization algorithm where the first objective aims to maximize the model return, and the second objective synchronously calibrates the learning bias of the policy. Our method achieves more stable policy improvement on offline MbRL tasks. (2) To the best of our knowledge, our approach is the first to adopt evolution strategy (ES) to model-based offline RL problems and solve the optimization under uncertain and long-horizon RL tasks. We also theoretically establish an upper bound for the norm of a BiES-based gradient estimation. (3) We conduct a large-scale empirical study on offline MuJoCo locomotion tasks from the D4RL benchmark [8]. The experimental results show that our method attains state-of-the-art results compared to other offline RL algorithms.

Perspectives

I am working towards a researcher in human-centred AI, looking at the systemic impact of how to teach intelligent agents to work with human. I’ve experienced firsthand the beneficial effects of spending time working with researcher from all over the world, as part of a community. I believe this work can make contributions to the AI and robotics communities.

Dr. Zhuowei Wang
University of Technology Sydney

Read the Original

This page is a summary of: BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning, January 2022, Springer Science + Business Media,
DOI: 10.1007/978-3-030-97546-3_46.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page