Dimension Reduction for Fréchet Regression

Qi Zhang,Lingzhou Xue,Bing Li
DOI: https://doi.org/10.48550/arXiv.2110.00467
2022-12-07
Abstract:With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fréchet regression model (Peterson & Müller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method for Fréchet regression to achieve two purposes: to mitigate the curse of dimensionality caused by high-dimensional predictors and to provide a visual inspection tool for Fréchet regression. Our approach is flexible enough to turn any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. The basic idea is to first map the metric-space valued random object $Y$ to a real-valued random variable $f(Y)$ using a class of functions, and then perform classical SDR to the transformed data. If the class of functions is sufficiently rich, then we are guaranteed to uncover the Fréchet SDR space. We showed that such a class, which we call an ensemble, can be generated by a universal kernel. We established the consistency and asymptotic convergence rate of the proposed methods. The finite-sample performance of the proposed methods is illustrated through simulation studies for several commonly encountered metric spaces that include Wasserstein space, the space of symmetric positive definite matrices, and the sphere. We illustrated the data visualization aspect of our method by exploring the human mortality distribution data across countries and by studying the distribution of hematoma density.
Methodology,Statistics Theory,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the "curse of dimensionality" problem faced in Fréchet regression analysis in the case of high - dimensional predictor variables, and to provide a data visualization tool in order to obtain an intuitive understanding of the regression relationship when dealing with metric - space - valued response variables. Specifically, the paper proposes a flexible sufficient dimension reduction (SDR) method for handling Fréchet regression with metric - space - valued response variables, aiming at: 1. **Alleviating the curse of dimensionality**: By reducing the dimension of high - dimensional predictor variables, the decrease in model accuracy caused by the increase in dimension is reduced. 2. **Providing a data visualization tool**: By projecting the predictor variables onto a low - dimensional subspace, the changing trend of the regression surface can be intuitively observed even in a high - dimensional space, thus helping to understand the regression relationship between complex data objects. To achieve these goals, the paper introduces an ensemble - based method that can extend any existing SDR method in Euclidean space to handle the case of Euclidean predictor variables and metric - space - valued response variables. The core idea of this method is to first map the metric - space - valued random object \(Y\) to a real - valued random variable \(f(Y)\), and then apply the classical SDR method to the transformed data. If the function class \(F\) is rich enough, then it can be guaranteed to reveal the Fréchet SDR space. The paper proves that this ensemble can be generated by using a cc - universal kernel, and establishes the consistency and asymptotic convergence rate of the proposed method. The paper also demonstrates the performance in finite samples and the data visualization ability of the proposed method through simulation studies and practical data applications (such as human mortality distribution data in the United Nations database).