Efficient algorithms for structured stateless reinforcement learning
Photo by Kelly Sikkema on Unsplash
What is it about?
We present the first efficient algorithm for combinatorial bandits (a kind of structured stateless reinforcement learning). This algorithm is efficient in two ways: it reaps a total reward that is close to the optimum one, and each decision can be taken in polynomial time.
Why is it important?
So far, no such algorithm existed: previous ones either had a very good reward, or could run in polynomial time. Our work shows that there is no such trade-off to be made.
The following have contributed to this page: Thibaut Cuvelier and Eric Gourdin