Abstract:Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $\chi^2$-divergences as a special case. We prove that our algorithm finds an $\epsilon$-stationary point with a computational complexity of $\mathcal O(\epsilon^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.

What problem does this paper attempt to address?

This paper focuses on the problem of Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization (DRO). In machine learning, traditional empirical risk minimization methods may suffer from performance degradation due to mismatch between training and testing data distributions. The DRO framework has been proposed to train models that are robust to data distribution changes, by finding the solution that minimizes the expected loss under the worst-case scenario within an uncertainty set. The paper specifically addresses the non-convex constrained DRO, which has been less explored in previous research, especially when the loss function is non-convex, such as in neural networks. The authors propose a new stochastic algorithm with computational complexity independent of the overall dataset size in each iteration, making it suitable for large-scale applications. They focus on uncertainty sets based on the Cressie-Read family distance, which includes χ2 divergence as a special case, and also investigate the Conditional Value-at-Risk (CVaR) DRO problem with smooth conditional value functions. The challenges faced in the paper include: 1. In large-scale applications, direct computation of the full gradient is not feasible due to the large number of training samples, requiring efficient methods that can estimate gradients using a small number of samples. 2. The non-convex loss function makes it difficult to generalize existing methods. 3. The Lagrangian dual form of the constrained DRO is neither smooth nor Lipschitz, making convergence analysis difficult. The main contributions of the paper are: 1. Designing a new stochastic algorithm to solve the non-convex constrained DRO problem with biased estimation, with computational complexity independent of the training data size in each iteration. 2. Proposing a Frank-Wolfe update method for Lagrange multipliers to control the gap between the objective function and its optimal value. 3. The algorithm can be applied to solve the non-convex constrained DRO problem, converging to a local minimum and outperforming existing methods in numerical experiments. Through these contributions, the paper provides effective tools for handling large-scale non-convex constrained distributionally robust optimization, improving the robustness of models to changes in data distribution.

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

DORO: Distributional and Outlier Robust Optimization

Multistage Distributionally Robust Optimization for Integrated Production and Maintenance Scheduling

Distributed Robust Optimization in Networked System.

Towards Scalable and Fast Distributionally Robust Optimization for Data-Driven Deep Learning

Distributed Distributionally Robust Optimization with Non-Convex Objectives

Large-Scale Methods for Distributionally Robust Optimization

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

Stochastic First-Order Algorithms for Constrained Distributionally Robust Optimization

Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions

Outlier-Robust Wasserstein DRO

Nonlinear Distributionally Robust Optimization

A Primal-Dual Algorithm for Faster Distributionally Robust Optimization

Distributionally Robust Optimization: A review on theory and applications

Bayesian Distributionally Robust Optimization

Efficient Algorithms for Distributionally Robust Stochastic Optimization with Discrete Scenario Support

Conic Reformulations for Kullback-Leibler Divergence Constrained Distributionally Robust Optimization and Applications

A Robust Learning Algorithm for Regression Models Using Distributionally Robust Optimization under the Wasserstein Metric

Doubly Robust Data-Driven Distributionally Robust Optimization