Learning Anticipatory Decision for Distributed Systems with Robustness Guarantees
Peijiang Liu,Xindi Yang,Hongliang Ren,Hao Zhang,Zhuping Wang
DOI: https://doi.org/10.1109/tase.2024.3493912
IF: 6.636
2024-01-01
IEEE Transactions on Automation Science and Engineering
Abstract:This paper investigates anticipatory decision for unknown distributed systems with robustness concerns. Anticipatory decision focuses on action selection before observations appear at temporal scales. Firstly, anticipatory decision forms sequential feedback with min-max performance guarantees, while causality comes from time series analysis. Next, distribution, robustness and time consistency partition the optimization into spatial and temporal sub-games. The spatial sub-games dispel conflicts on distribution and robustness, while the temporal ones ensure stability and performance through time consistency. Finally, we propose a multi-step reinforcement learning algorithm under causality analysis and game theoretical framework. Numerical results demonstrate the effectiveness of the approach, and practical experiments show potential real-world applications. Note to Practitioners —This framework focuses on anticipatory decision for distributed systems, which suffer from distributed communication, unknown dynamics, environmental disturbances and state observation loss. Our framework has various application scenarios, e.g., internal surgical robots, low-light autonomous driving and non-GPS navigation, and these scenarios mainly involve dynamic environments and weak signal feedback. For example, decision-making in autonomous driving requires not only reacting to current environmental conditions but also anticipating future scenarios and uncertainties due to poor visibility. Most results deal these issues with model-driven approaches, while unknown dynamics render these methods inapplicable. For implementation, we propose a multi-step reinforcement learning algorithm for anticipatory decision framework with stability and robustness guarantees, and details mainly contain three parts: 1) We collect data during offline phase, and form the data structure, namely, current-next observation pair with multi-step decision and accumulated reward; 2) Strategies and value functions are approximated with neural networks through Monte-Carlo methods; 3) The strategy is deployed as sequential feedback in practical systems, and predicts multi-step decisions with single-step state observation. Finally, we select robot consensus with optical sensors as the implementation demo.