Robust Estimation in Regression and Classification Methods for Large Dimensional Data

Chunming Zhang,Lixing Zhu,Yanbo Shen
DOI: https://doi.org/10.1007/s10994-023-06349-2
IF: 5.414
2023-01-01
Machine Learning
Abstract:Statistical data analysis and machine learning heavily rely on error measures for regression, classification, and forecasting. Bregman divergence ( BD ) is a widely used family of error measures, but it is not robust to outlying observations or high leverage points in large- and high-dimensional datasets. In this paper, we propose a new family of robust Bregman divergences called “ robust - BD ” that are less sensitive to data outliers. We explore their suitability for sparse large-dimensional regression models with incompletely specified response variable distributions and propose a new estimate called the “ penalized robust - BD estimate ” that achieves the same oracle property as ordinary non-robust penalized least-squares and penalized-likelihood estimates. We conduct extensive numerical experiments to evaluate the performance of the proposed penalized robust- BD estimate and compare it with classical approaches, and show that our proposed method improves on existing approaches. Finally, we analyze a real dataset to illustrate the practicality of our proposed method. Our findings suggest that the proposed method can be a useful tool for robust statistical data analysis and machine learning in the presence of outliers and large-dimensional data.
What problem does this paper attempt to address?