Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Thibaut Cuvelier; Richard Combes; Eric Gourdin

doi:10.1145/3447387

What is it about?

We present the first efficient algorithm for combinatorial bandits (a kind of structured stateless reinforcement learning). This algorithm is efficient in two ways: it reaps a total reward that is close to the optimum one, and each decision can be taken in polynomial time.

Photo by Kelly Sikkema on Unsplash

Why is it important?

So far, no such algorithm existed: previous ones either had a very good reward, or could run in polynomial time. Our work shows that there is no such trade-off to be made.

Perspectives

I hope that this work shows that combinatorial bandits can be used broadly at an industrial scale.
Thibaut Cuvelier
CentraleSupelec

This page is a summary of: Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits, Proceedings of the ACM on Measurement and Analysis of Computing Systems, February 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3447387.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Efficient algorithms for structured stateless reinforcement learning

What is it about?

Why is it important?

Perspectives

Resources

arXiv

Source code

Related publication

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Efficient algorithms for structured stateless reinforcement learning

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

arXiv

Source code

Related publication

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management