Abstract:Reliability of sequential hypothesis testing can be greatly improved when decision maker is given the freedom to adaptively take an action that determines the distribution of the current collected sample. Such advantage of sampling adaptivity has been realized since Chernoff's seminal paper in 1959 [1]. While a large body of works have explored and investigated the gain of adaptivity, in the general multiple-hypothesis setting, the fundamental limits of individual error probabilities have not been fully understood. In particular, in the asymptotic regime as the expected stopping time tends to infinity, the error exponents are only characterized in specific cases, such as that of the total error probability. In this paper, we consider a general setup of active sequential multiple-hypothesis testing where at each time slot, a temporally varying subset of data sources (out of a known set) emerges from which the decision maker can select to collect samples, subject to a family of expected selection budget constraints. The selection of sources, understood as the "action" at each time slot, is constrained in a predefined action space. At the end of each time slot, the decision maker either decides to make the inference on the $M$ hypotheses, or continues to observe the data sources for the next time slot. The optimal tradeoffs among $M(M-1)$ types of error exponents are characterized. A companion asymptotically optimal test that strikes the balance between exploration and exploitation is proposed to achieve any target error exponents within the region. To the best of our knowledge, this is the first time in the literature to identify such tradeoffs among error exponents, and it uncovers the tension among different action taking policies even in the basic setting of Chernoff [1].

Best Action Selection In A Stochastic Environment

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Exploration Analysis in Finite-Horizon Turn-based Stochastic Games.

Efficient Dynamic Allocation Policy for Robust Ranking and Selection under Stochastic Control Framework

Ranking and Selection as Stochastic Control

Distributionally Robust Selection of the Best

Randomized Optimal Stopping Problem in Continuous time and Reinforcement Learning Algorithm

Extremum-Seeking Action Selection for Accelerating Policy Optimization

POMDP-Based Ranking and Selection.

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

Stochastic Games with Minimally Bounded Action Costs

Selection of the Best in the Presence of Subjective Stochastic Constraints

Bayesian Approaches to Modelling Action Selection

Adaptive Minimum Action Method For The Study Of Rare Events

Monte Carlo Tree Search with Boltzmann Exploration

Approximate optimality and the risk/reward tradeoff given repeated gambles

Chance Constrained Selection of the Best.

Tradeoffs among Action Taking Policies Matter in Active Sequential Multi-Hypothesis Testing: the Optimal Error Exponent Region

Decision Making in Non-Stationary Environments with Policy-Augmented Search