Abstract:SIAM Journal on Optimization, Volume 34, Issue 1, Page 419-458, March 2024. A typical data-driven stochastic program seeks the best decision that minimizes the sum of a deterministic cost function and an expected recourse function under a given distribution. Recently, much success has been witnessed in the development of distributionally robust optimization (DRO), which considers the worst-case expected recourse function under the least favorable probability distribution from a distributional family. However, in the presence of endogenous outliers such that their corresponding recourse function values are very large or even infinite, the commonly used DRO framework alone tends to overemphasize these endogenous outliers and cause undesirable or even infeasible decisions. On the contrary, distributionally favorable optimization (DFO), concerning the best-case expected recourse function under the most favorable distribution from the distributional family, can serve as a proper measure of the stochastic recourse function and mitigate the effect of endogenous outliers. We show that DFO recovers many robust statistics, suggesting that the DFO framework might be appropriate for the stochastic recourse function in the presence of endogenous outliers. A notion of decision outlier robustness is proposed for selecting a DFO framework for data-driven optimization with outliers. We also provide a unified way to integrate DRO with DFO, where DRO addresses the out-of-sample performance, and DFO properly handles the stochastic recourse function under endogenous outliers. We further extend the proposed DFO framework to solve two-stage stochastic programs without relatively complete recourse. The numerical study demonstrates that the framework is promising.

Accounting for outliers in optimal subsampling methods

A sub-sampling algorithm preventing outliers

Optimal subsampling designs

Optimal design subsampling from Big Datasets

Sample Weighting: an Inherent Approach for Outlier Suppressing Discriminant Analysis

Optimal Subsampling Approaches for Large Sample Linear Regression

Robust optimal subsampling based on weighted asymmetric least squares

Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets

Subsampling Suffices for Adaptive Data Analysis

Optimal Subsampling Algorithms for Big Data Generalized Linear Models.

Optimal Subsampling Algorithms for Big Data Regressions

A review on design inspired subsampling for big data

Optimal Subsampling Algorithms for Big Data Generalized Linear Models

Simultaneous feature selection and outlier detection with optimality guarantees

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Sample size determination for multidimensional parameters and the A-optimal subsampling in a big data linear regression model

Optimal subsampling algorithm for the marginal model with large longitudinal data

Optimal Subsampling for Large Sample Logistic Regression

Projection-Uniform Subsampling Methods for Big Data

Distributionally Favorable Optimization: A Framework for Data-Driven Decision-Making with Endogenous Outliers

Optimal Subsampling for Large-Scale Quantile Regression