Abstract:Factored Markov Decision Processes (fMDPs) are a class of Markov Decision Processes (MDPs) in which the states (and actions) can be factored into a set of state (and action) variables. The state space, action space and reward function of a fMDP can be encoded compactly using a factored representation. In this paper, we consider the setting where we have a set of potential sensors to select for the fMDP (at design-time), where each sensor measures a certain state variable and has a selection cost. We formulate the problem of selecting an optimal set of sensors for fMDPs (subject to certain budget constraints) to maximize the expected infinite-horizon discounted return provided by the optimal control policy. We show the fundamental result that it is NP-hard to approximate this optimization problem to within any non-trivial factor. We then study the dual problem of budgeted actuator selection (at design-time) to maximize the expected return under the optimal policy. Again, we show that it is NP-hard to approximate this optimization problem to within any non-trivial factor. Furthermore, with explicit examples, we show the failure of greedy algorithms for both the sensor and actuator selection problems and provide insights into the factors that cause these problems to be challenging. Despite the inapproximability results, through extensive simulations, we show that the greedy algorithm may provide near-optimal performance for actuator and sensor selection in many real-world and randomly generated fMDP instances.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to select the optimal set of sensors and actuators in Factorized Markov Decision Processes (fMDPs) to maximize the infinite - time discounted return provided by the optimal control policy under a given budget constraint. Specifically, the paper proposes two main problems: 1. **Sensor Selection Problem (fMDP - SS)**: - Select an optimal subset of sensors at the design stage, so that in fMDP, according to the observation information generated by the selected sensors, the infinite - time discounted return under the optimal policy can be maximized. - The goal of this problem is to select a set of sensors under the condition of satisfying the budget constraint, so that the agent can better estimate the state and thus obtain a higher long - term return. 2. **Actuator Selection Problem (fMDP - AS)**: - Select an optimal subset of actuators at the design stage, so that in fMDP, according to the influence of the selected actuators, the infinite - time discounted return under the optimal policy can be maximized. - The goal of this problem is to select a set of actuators under the condition of satisfying the budget constraint, so that the agent can better influence the state transition of the system and thus obtain a higher long - term return. ### Complexity and Approximation of the Problems The paper proves that both of these problems are NP - hard, and further shows that for any \(\epsilon>0\), there is no algorithm that can approximate these problems to the factor of \(n^{1 - \epsilon}\) in polynomial time. This means that even with ideal computing resources, it is difficult to find the exact solutions or high - quality approximate solutions of these optimization problems. In addition, the paper shows the failure cases of the greedy algorithm on these two problems through specific examples, and provides insights into the factors that make these problems challenging. ### Application Scenarios These problems have a wide range of applications in practical applications, such as: - **Mobile Robot Teams**: When performing localization and task execution simultaneously in an environment, it is necessary to select the optimal sensors to improve the overall performance. - **Power Distribution Networks**: In complex power networks, in order to minimize fault propagation and isolate critical nodes, it is necessary to select the optimal sensors and actuators. ### Main Contributions 1. **Complexity Analysis**: Prove that the fMDP - SS and fMDP - AS problems are NP - hard and cannot be approximated to any non - trivial factor in polynomial time. 2. **Performance of Greedy Algorithm**: Show the cases where the greedy algorithm may perform poorly on these two problems and explain the reasons. 3. **Empirical Results**: Although there are inapproximability results theoretically, a large number of simulation experiments show that in many practical and randomly generated fMDP instances, the greedy algorithm may still provide near - optimal solutions. Through these contributions, the paper provides a profound theoretical basis and practical guidance for the sensor and actuator selection problems in fMDP.

Optimal Sensor and Actuator Selection for Factored Markov Decision Processes: Complexity, Approximability and Algorithms

An MILP-Based Solution Scheme for Factored and Robust Factored Markov Decision Processes

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Efficient Approximate Linear Programming for Factored MDPs

Combinatorial Selection with Costly Information

Optimal Time-Abstract Schedulers for CTMDPs and Markov Games

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

OCMDP: Observation-Constrained Markov Decision Process

Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPs

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Risk-Averse Decision Making Under Uncertainty

Sample Efficient Feature Selection for Factored MDPs

Stochastic Principal-Agent Problems: Efficient Computation and Learning

Solving Markov Decision Processes with Reachability Characterization from Mean First Passage Times

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes.

Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Stochastic Finite State Control of POMDPs with LTL Specifications

Approximate dynamic programming with $(\min,+)$ linear function approximation for Markov decision processes

Optimal Control Synthesis of Markov Decision Processes for Efficiency with Surveillance Tasks