Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

  • Thibaut Cuvelier, Richard Combes, Eric Gourdin
  • Proceedings of the ACM on Measurement and Analysis of Computing Systems, February 2021, ACM (Association for Computing Machinery)
  • DOI: 10.1145/3447387

Efficient algorithms for structured stateless reinforcement learning

Photo by Kelly Sikkema on Unsplash

Photo by Kelly Sikkema on Unsplash

What is it about?

We present the first efficient algorithm for combinatorial bandits (a kind of structured stateless reinforcement learning). This algorithm is efficient in two ways: it reaps a total reward that is close to the optimum one, and each decision can be taken in polynomial time.

Why is it important?

So far, no such algorithm existed: previous ones either had a very good reward, or could run in polynomial time. Our work shows that there is no such trade-off to be made.

Perspectives

Thibaut Cuvelier
CentraleSupelec

I hope that this work shows that combinatorial bandits can be used broadly at an industrial scale.

Read Publication

http://dx.doi.org/10.1145/3447387

The following have contributed to this page: Thibaut Cuvelier and Eric Gourdin