What is it about?

The Knowledge Gradient heuristic attempts to balance exploration and exploitation in the multi-armed bandit problem. However this paper shows it has the serious weakness of taking actions that are inferior for both exploration and exploitation. The paper describes when this happens and proposes new variants that avoid these errors.

Featured Image

Why is it important?

Multi-armed bandits are an important class of sequential decision processes which are crucial in a wide range of contemporary applications. They are often tough to solve exactly and so heuristics are required. The Knowledge Gradient is one such method which has been promoted as an effective and easy to compute general solution method.

Perspectives

The paper contains an extensive computational study and surprisingly it was found that index policies (such as those based indices such as the Gittins index) could be more robust than non-index methods such as the Knowledge Gradient even on problem variants where index methods are known not to be optimal.

James Edwards
Lancaster University

Read the Original

This page is a summary of: ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS, Probability in the Engineering and Informational Sciences, September 2016, Cambridge University Press,
DOI: 10.1017/s0269964816000279.
You can read the full text:

Read

Contributors

The following have contributed to this page