What is it about?
We found that the learning progress of a world model that is computed locally in self-organized regions of a learned latent space provides a spatially and temporally local estimate of the reliability in model predictions. This estimate is used to arbitrate between model-based and model-free decisions and compute an adaptive prediction horizon for model predictive control and experience imagination. It is also used to derive an intrinsic reward to encourage the robot to take actions that lead to data that improves its model of the world. Our approach improves the efficiency of learning visuomotor control in simulation and real world. Policy networks trained in simulation with our approach are shown to perform well on the physical robot using a simple simulation-to-real transfer, without fine-tuning of the policy parameters.
Featured Image
Why is it important?
Unlike previous works, our approach does not assume perfect model predictions over a fixed time horizon. Instead, the prediction horizon used for planning and generating imagined experiences in our approach adapts to the improvement in learning the model. This is important to prevent the robot from ending up in parts of the environment where it does not have much data and where the model estimate is very poor. Therefore, our adaptive-length model rollout ensures that no imperfect model predictions are used in computing the optimal plan and reduces the computational cost of planning.
Read the Original
This page is a summary of: Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination, Robotics and Autonomous Systems, November 2020, Elsevier,
DOI: 10.1016/j.robot.2020.103630.
You can read the full text:
Resources
Contributors
The following have contributed to this page