Optimal Sensor and Actuator Selection for Factored Markov Decision Processes: Complexity, Approximability and Algorithms

Jayanth Bhargav,Mahsa Ghasemi,Shreyas Sundaram
2024-07-10
Abstract:Factored Markov Decision Processes (fMDPs) are a class of Markov Decision Processes (MDPs) in which the states (and actions) can be factored into a set of state (and action) variables. The state space, action space and reward function of a fMDP can be encoded compactly using a factored representation. In this paper, we consider the setting where we have a set of potential sensors to select for the fMDP (at design-time), where each sensor measures a certain state variable and has a selection cost. We formulate the problem of selecting an optimal set of sensors for fMDPs (subject to certain budget constraints) to maximize the expected infinite-horizon discounted return provided by the optimal control policy. We show the fundamental result that it is NP-hard to approximate this optimization problem to within any non-trivial factor. We then study the dual problem of budgeted actuator selection (at design-time) to maximize the expected return under the optimal policy. Again, we show that it is NP-hard to approximate this optimization problem to within any non-trivial factor. Furthermore, with explicit examples, we show the failure of greedy algorithms for both the sensor and actuator selection problems and provide insights into the factors that cause these problems to be challenging. Despite the inapproximability results, through extensive simulations, we show that the greedy algorithm may provide near-optimal performance for actuator and sensor selection in many real-world and randomly generated fMDP instances.
Systems and Control,Computational Complexity,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to select the optimal set of sensors and actuators in Factorized Markov Decision Processes (fMDPs) to maximize the infinite - time discounted return provided by the optimal control policy under a given budget constraint. Specifically, the paper proposes two main problems: 1. **Sensor Selection Problem (fMDP - SS)**: - Select an optimal subset of sensors at the design stage, so that in fMDP, according to the observation information generated by the selected sensors, the infinite - time discounted return under the optimal policy can be maximized. - The goal of this problem is to select a set of sensors under the condition of satisfying the budget constraint, so that the agent can better estimate the state and thus obtain a higher long - term return. 2. **Actuator Selection Problem (fMDP - AS)**: - Select an optimal subset of actuators at the design stage, so that in fMDP, according to the influence of the selected actuators, the infinite - time discounted return under the optimal policy can be maximized. - The goal of this problem is to select a set of actuators under the condition of satisfying the budget constraint, so that the agent can better influence the state transition of the system and thus obtain a higher long - term return. ### Complexity and Approximation of the Problems The paper proves that both of these problems are NP - hard, and further shows that for any \(\epsilon>0\), there is no algorithm that can approximate these problems to the factor of \(n^{1 - \epsilon}\) in polynomial time. This means that even with ideal computing resources, it is difficult to find the exact solutions or high - quality approximate solutions of these optimization problems. In addition, the paper shows the failure cases of the greedy algorithm on these two problems through specific examples, and provides insights into the factors that make these problems challenging. ### Application Scenarios These problems have a wide range of applications in practical applications, such as: - **Mobile Robot Teams**: When performing localization and task execution simultaneously in an environment, it is necessary to select the optimal sensors to improve the overall performance. - **Power Distribution Networks**: In complex power networks, in order to minimize fault propagation and isolate critical nodes, it is necessary to select the optimal sensors and actuators. ### Main Contributions 1. **Complexity Analysis**: Prove that the fMDP - SS and fMDP - AS problems are NP - hard and cannot be approximated to any non - trivial factor in polynomial time. 2. **Performance of Greedy Algorithm**: Show the cases where the greedy algorithm may perform poorly on these two problems and explain the reasons. 3. **Empirical Results**: Although there are inapproximability results theoretically, a large number of simulation experiments show that in many practical and randomly generated fMDP instances, the greedy algorithm may still provide near - optimal solutions. Through these contributions, the paper provides a profound theoretical basis and practical guidance for the sensor and actuator selection problems in fMDP.