What is it about?
This paper proposes an algorithm combining multiple strategies on the stochastic multi-armed bandit problem. From experiments of Auer et al. (2002), we can know that multi-armed bandit algorithms perform differently on problems with different reward distributions. In the situation where it is not known which strategies are best our algorithm is one solution.
Featured Image
Why is it important?
Theoretically, the proposed algorithm epsilon_t-comb converges to the best strategy asymptotically. The definition of best strategy is found in the paper.
Read the Original
This page is a summary of: Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality, Journal of Control Science and Engineering, January 2015, Hindawi Publishing Corporation,
DOI: 10.1155/2015/264953.
You can read the full text:
Contributors
The following have contributed to this page