What is it about?

The Thresholding Bandit (TB) problem is a popular sequential decision-making problem, which aims at identifying the systems whose means are greater than a threshold. Instead of working on the upper bound of a loss function, our approach stands out from conventional practices by directly minimizing the loss itself. Leveraging the large deviation theory, we firstly provide an asymptotically optimal allocation rule for the TB problem, and then propose a parameter-free Large Deviation (LD) algorithm to make the allocation rule implementable. Central limit theorem-based Large Deviation (CLD) algorithm is further proposed as a supplement to improve the computation efficiency using normal approximation. Extensive experiments are conducted to validate the superiority of our algorithms compared to existing methods, and demonstrate their broader applications to more general distributions and various kinds of loss functions. This work ultimately provides a significant advancement in the TB problem by offering a more efficient and broadly applicable solution framework.

Featured Image

Why is it important?

This research investigates the Thresholding Bandit (TB) problem, which is a significant aspect of sequential decision-making in fields like machine learning and decision theory. By focusing on minimizing the loss function directly, rather than its upper bound, the study proposes innovative algorithms that enhance computational efficiency and applicability to various distributions. Key Takeaways: 1. Asymptotically optimal solution to minimize the expected loss. The conventional approaches mainly work on the upper bound of loss functions and thus may lead to suboptimal decision-making. Instead, our approach stands out by directly minimizing the loss itself and offering an asymptotically optimal solution by large deviation theory. Notably, this unique approach facilitates an asymptotically optimal allocation rule specifically tailored for the TB problem, which not only matches the intuition in earlier result that effective algorithms for TB problem must pull each arm a linear amount of times, but also provides the asymptotically optimal linear coefficients firstly. 2. Parameter-free algorithms based on the asymptotically optimal allocation rule. A Large Deviation (LD) algorithm is proposed for the TB problem with a fixed budget based on the asymptotically optimal allocation rule. Much different from existing methods, the LD algorithm is parameter-free, which alleviates the dependency problem on user-specified parameters by existing algorithms and enhances its stronger adaptability in practice. Additionally, A Central Limit Theorem (CLT)-based Large Deviation (CLD) algorithm is proposed to further improve the computation efficiency of sample allocation by utilizing the normal approximation, which is also a parameter-free algorithm. 3. Wilder application scenarios and superior performance. We further show the wider applicability of our proposed algorithms to more general distributions and various loss functions. Extensive numerical results are provided to validate the effectiveness and superiority of our proposed algorithms compared to the prevalent methods.

AI notice

Some of the content on this page has been created using generative AI.

Read the Original

This page is a summary of: Large Deviation Algorithms for Thresholding Bandit Problem, Big Data Mining and Analytics, October 2025, Tsinghua University Press,
DOI: 10.26599/bdma.2025.9020028.
You can read the full text:

Read

Contributors

The following have contributed to this page