Offline Learning of Decision Functions in Multiplayer Games with Expectation Constraints

Yuanhanqing Huang,Jianghai Hu
2024-02-24
Abstract:We explore a class of stochastic multiplayer games where each player in the game aims to optimize its objective under uncertainty and adheres to some expectation constraints. The study employs an offline learning paradigm, leveraging a pre-existing dataset containing auxiliary features. While prior research in deterministic and stochastic multiplayer games primarily explored vector-valued decisions, this work departs by considering function-valued decisions that incorporate auxiliary features as input. We leverage the law of large deviations and degree theory to establish the almost sure convergence of the offline learning solution to the true solution as the number of data samples increases. Finally, we demonstrate the validity of our method via a multi-account portfolio optimization problem.
Optimization and Control,Systems and Control
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper explores offline learning methods for decision functions in a class of stochastic multi-player games. Specifically, each player optimizes their objective under uncertainty and follows certain expectation constraints. Unlike previous studies that mainly focus on vector-valued decisions, this paper investigates function-valued decisions with auxiliary features as inputs. #### Main Contributions: 1. **Offline Learning Paradigm**: This paper adopts an offline learning paradigm, utilizing pre-existing datasets to formulate decision functions. This approach is suitable for safety-critical domains requiring strict data quality and model performance guarantees, such as collision avoidance in autonomous vehicles and medical diagnostic systems. 2. **Function-Valued Decisions**: Unlike traditional vector-valued decisions, this paper considers function-valued decisions, where each player uses a decision function to generate an optimal decision vector based on observed auxiliary features. 3. **Expectation Constraints**: Each player's optimization problem includes nonlinear expectation constraints, allowing the modeling of specific scenarios such as obstacle avoidance and risk measures (e.g., Conditional Value at Risk, CVaR). 4. **Convergence Analysis**: Using large deviation theory and degree theory, the paper proves that the offline learning solution almost surely converges to the true solution as the sample size increases. #### Numerical Experiments: The paper validates the method's effectiveness through a multi-account portfolio optimization problem. The experiment considers 3 investment accounts, 4 assets, and 5 observable features. Results show that as the number of training samples increases, the relative distance gradually decreases, and the penalties for constraint violations and regularization also decrease. Additionally, numerical results indicate that all expectation constraints are satisfied, especially with a larger sample size. ### Conclusion This paper proposes a new offline learning method to address stochastic multi-player games with expectation-valued objectives and expectation constraints. The method proves that the decision function almost surely converges to a stable decision function in a finite-dimensional function space. Future work will further extend this to Reproducing Kernel Hilbert Spaces (RKHS) with general kernels.