What is it about?
The tradeoff between risk and reward is well-known. Choosing the right metric for risk is critical for making good decisions. The Markov Decision Process (MDP) is an important problem within the domain of machine learning, in particular Reinforcement Learning (RL), where this tradeoff becomes critical. This paper presents a new RL algorithm in which variance of the rewards is used as a risk metric. The new algorithm is shown to work successfully on problems where the classical exponential utility risk metric breaks down.
Featured Image
Why is it important?
This work shows that the new algorithm works successfully on problem instances where the well-studied Exponential Utility (EU) risk metric breaks down. This is explained with a numerical example; a theoretical explanation of why the computer overflows when the EU framework is utilized is also provided.
Read the Original
This page is a summary of: Beyond exponential utility functions: A variance-adjusted approach for risk-averse reinforcement learning, December 2014, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/adprl.2014.7010645.
You can read the full text:
Contributors
The following have contributed to this page