A Short and General Duality Proof for Wasserstein Distributionally Robust Optimization

Luhao Zhang,Jincheng Yang,Rui Gao
2024-06-05
Abstract:We present a general duality result for Wasserstein distributionally robust optimization that holds for any Kantorovich transport cost, measurable loss function, and nominal probability distribution. Assuming an interchangeability principle inherent in existing duality results, our proof only uses one-dimensional convex analysis. Furthermore, we demonstrate that the interchangeability principle holds if and only if certain measurable projection and weak measurable selection conditions are satisfied. To illustrate the broader applicability of our approach, we provide a rigorous treatment of duality results in distributionally robust Markov decision processes and distributionally robust multistage stochastic programming. Additionally, we extend our analysis to other problems such as infinity-Wasserstein distributionally robust optimization, risk-averse optimization, and globalized distributionally robust counterpart.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide a general duality result for Wasserstein Distributionally Robust Optimization (DRO). Specifically, the paper focuses on the following problem: \[ L(\epsilon) := \sup_{\mathbb{P} \in \mathcal{P}(X)} \left\{ \mathbb{E}_{\xi \sim \mathbb{P}}[\ell(\xi)] : K_c(\hat{\mathbb{P}}, \mathbb{P}) \leq \epsilon \right\} \] where: - $\epsilon \in [0, \infty)$ is the radius of uncertainty; - $\mathcal{P}(X)$ is the set of all probability distributions on the data space $X$; - $\ell: X \to \mathbb{R}$ is the loss function; - $\xi$ is a random variable with the nominal distribution $\hat{\mathbb{P}}$; - $K_c$ is the Kantorovich transportation cost, defined as: \[ K_c(\hat{\mathbb{P}}, \mathbb{P}) = \inf_{\pi \in \Gamma(\hat{\mathbb{P}}, \mathbb{P})} \mathbb{E}_{(\hat{\xi}, \xi) \sim \pi}[c(\hat{\xi}, \xi)] \] Here $\Gamma(\hat{\mathbb{P}}, \mathbb{P})$ represents the set of all probability distributions with $\hat{\mathbb{P}}$ and $\mathbb{P}$ as marginal distributions, and $c: X \times X \to [0, \infty]$ is the transportation cost function. **Core Problem**: The main objective of the paper is to provide a general and concise duality proof for the above - mentioned problem (P). Previous studies usually rely on complex convex duality theories or Fenchel conjugates on vector spaces, while this paper proposes a new proof method, using only the Legendre transform in one - dimensional convex analysis and assuming the Interchangeability Principle. In addition, the author further explores the equivalent conditions of the Interchangeability Principle and relates it to the Measurable Projection Theorem and the Measurable Selection Theorem. Through this method, the author not only simplifies the proof process but also generalizes the duality result to a broader context, including but not limited to distributionally robust Markov decision processes, distributionally robust multi - stage stochastic programming, infinite Wasserstein distributionally robust optimization, risk - averse optimization, and globalized distributionally robust counterparts, etc. In summary, the paper aims to introduce a new proof technique to provide a more general and concise duality framework for Wasserstein distributionally robust optimization.