What is it about?

Building Batch Reinforcement Learning (B2RL) dataset is for domain

Featured Image

Why is it important?

Batch reinforcement learning (BRL) is an emerging field in the reinforcement learning community. It learns exclusively from static datasets (i.e. replay buffers). Model-free BRL models are capable of learning the optimal policy without the need for accurate environment models or simulation environments as oracles. Model-based BRL methods learn the environment, and dynamic models, from the buffers, then use these models to predict environment responses and generate Markov Decision Process (MDP) transitions given states and actions from policies. In the offline settings, existing replay experiences are used as prior knowledge for BRL models to learn from. Thus, generating replay buffers are crucial for the BRL model benchmark. In our B2RL (Building Batch RL) dataset, we collect real-world datasets from our database, as well as buffers generated by several behavioral policies in simulation environments. To the best of our knowledge, we are the first to open-source building datasets for the purpose of batch RL learning.

Perspectives

Batch RL is the closest thing to real and general AI at this point in my perspective: First, it learns from static datasets to yield a safe control policy, which is more close to the industry's expectation. Second, it could be model-free, in most environments we don't have simulators or it's too expensive to train one. Finally, RL's ability is more than predicting the time sequence data like traditional DL/ML, RL could yield optimal control policy to reach the goal we try to achieve.

Ph.D. candidate Hsin-Yu Liu
University of California San Diego

Read the Original

This page is a summary of: B2RL, November 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3563357.3566164.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page