Abstract:We explore a class of stochastic multiplayer games where each player in the game aims to optimize its objective under uncertainty and adheres to some expectation constraints. The study employs an offline learning paradigm, leveraging a pre-existing dataset containing auxiliary features. While prior research in deterministic and stochastic multiplayer games primarily explored vector-valued decisions, this work departs by considering function-valued decisions that incorporate auxiliary features as input. We leverage the law of large deviations and degree theory to establish the almost sure convergence of the offline learning solution to the true solution as the number of data samples increases. Finally, we demonstrate the validity of our method via a multi-account portfolio optimization problem.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper explores offline learning methods for decision functions in a class of stochastic multi-player games. Specifically, each player optimizes their objective under uncertainty and follows certain expectation constraints. Unlike previous studies that mainly focus on vector-valued decisions, this paper investigates function-valued decisions with auxiliary features as inputs. #### Main Contributions: 1. **Offline Learning Paradigm**: This paper adopts an offline learning paradigm, utilizing pre-existing datasets to formulate decision functions. This approach is suitable for safety-critical domains requiring strict data quality and model performance guarantees, such as collision avoidance in autonomous vehicles and medical diagnostic systems. 2. **Function-Valued Decisions**: Unlike traditional vector-valued decisions, this paper considers function-valued decisions, where each player uses a decision function to generate an optimal decision vector based on observed auxiliary features. 3. **Expectation Constraints**: Each player's optimization problem includes nonlinear expectation constraints, allowing the modeling of specific scenarios such as obstacle avoidance and risk measures (e.g., Conditional Value at Risk, CVaR). 4. **Convergence Analysis**: Using large deviation theory and degree theory, the paper proves that the offline learning solution almost surely converges to the true solution as the sample size increases. #### Numerical Experiments: The paper validates the method's effectiveness through a multi-account portfolio optimization problem. The experiment considers 3 investment accounts, 4 assets, and 5 observable features. Results show that as the number of training samples increases, the relative distance gradually decreases, and the penalties for constraint violations and regularization also decrease. Additionally, numerical results indicate that all expectation constraints are satisfied, especially with a larger sample size. ### Conclusion This paper proposes a new offline learning method to address stochastic multi-player games with expectation-valued objectives and expectation constraints. The method proves that the decision function almost surely converges to a stable decision function in a finite-dimensional function space. Future work will further extend this to Reproducing Kernel Hilbert Spaces (RKHS) with general kernels.

Offline Learning of Decision Functions in Multiplayer Games with Expectation Constraints

Learning in Multi-Player Stochastic Games

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach.

Solving Decision-Dependent Games by Learning from Feedback

Distributed Non-Bayesian Learning for Games with Incomplete Information

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

Expectation in Stochastic Games with Prefix-independent Objectives

Online Learning: Stochastic and Constrained Adversaries

Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds.

A unified stochastic approximation framework for learning in games

Learning in Multi-level Stochastic games with Delayed Information

A Risk-Averse Equilibrium for Multi-Agent Systems

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

ELA: Exploited Level Augmentation for Offline Learning in Zero-Sum Games

Learning to Control Unknown Strongly Monotone Games

Efficient Methods for Non-stationary Online Learning

Online Learning in Weakly Coupled Markov Decision Processes: A Convergence Time Study

Robust Offline Policy Learning with Observational Data from Multiple Sources

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

Empirical Policy Optimization for n-Player Markov Games

Learning to Cover: Online Learning and Optimization with Irreversible Decisions