What is it about?
We present the first efficient algorithm for combinatorial bandits (a kind of structured stateless reinforcement learning). This algorithm is efficient in two ways: it reaps a total reward that is close to the optimum one, and each decision can be taken in polynomial time.
Photo by Kelly Sikkema on Unsplash
Why is it important?
So far, no such algorithm existed: previous ones either had a very good reward, or could run in polynomial time. Our work shows that there is no such trade-off to be made.
Read the Original
This page is a summary of: Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits, Proceedings of the ACM on Measurement and Analysis of Computing Systems, February 2021, ACM (Association for Computing Machinery), DOI: 10.1145/3447387.
You can read the full text:
The following have contributed to this page