Robust Bayesian Functional Principal Component Analysis

Jiarui Zhang,Jiguo Cao,Liangliang Wang
2023-07-19
Abstract:We develop a robust Bayesian functional principal component analysis (FPCA) by incorporating skew elliptical classes of distributions. The proposed method effectively captures the primary source of variation among curves, even when abnormal observations contaminate the data. We model the observations using skew elliptical distributions by introducing skewness with transformation and conditioning into the multivariate elliptical symmetric distribution. To recast the covariance function, we employ an approximate spectral decomposition. We discuss the selection of prior specifications and provide detailed information on posterior inference, including the forms of the full conditional distributions, choices of hyperparameters, and model selection strategies. Furthermore, we extend our model to accommodate sparse functional data with only a few observations per curve, thereby creating a more general Bayesian framework for FPCA. To assess the performance of our proposed model, we conduct simulation studies comparing it to well-known frequentist methods and conventional Bayesian methods. The results demonstrate that our method outperforms existing approaches in the presence of outliers and performs competitively in outlier-free datasets. Furthermore, we illustrate the effectiveness of our method by applying it to environmental and biological data to identify outlying functional data. The implementation of our proposed method and applications are available at https://github.com/SFU-Stat-ML/RBFPCA.
Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of the existing functional principal component analysis (FPCA) methods in dealing with outliers and sparse functional data. Specifically: 1. **Outlier Detection and Robustness**: Traditional FPCA methods usually assume that the data follows a normal distribution, which is often not true in practical applications, especially in the presence of outliers. These outliers can seriously affect the estimation of principal components, leading to a decline in model performance. By introducing Skew Elliptical Distributions, the paper proposes a Robust Bayesian FPCA method (RB - FPCA), which can still effectively capture the main sources of variation when the data is contaminated by outliers. 2. **Handling of Sparse Functional Data**: Functional data in many practical application scenarios are sparse and irregularly observed, such as longitudinal data in medical research. Traditional FPCA methods do not work well when dealing with such data. The paper extends the RB - FPCA model to make it adaptable to sparse functional data and provides a method based on conditional expectation to estimate the FPC scores, thereby improving the ability to handle sparse data. ### Main Contributions of the Paper 1. **Improved Robustness**: By introducing Skew Elliptical Distributions, the RB - FPCA method proposed in the paper can handle outliers in the data more effectively and improve the robustness of the model. Skew Elliptical Distributions can not only handle heavy - tailed data but also take into account the asymmetry of the data, thus avoiding the problem of needing to perform symmetric transformation on the data in traditional methods. 2. **Handling of Sparse Data**: The paper proposes the RB - FPCA model suitable for sparse functional data. By using the PACE method to estimate the principal component scores, the model can handle sparse and irregularly observed data. This improvement makes the RB - FPCA method more flexible and practical in practical applications. 3. **Advantages of the Bayesian Framework**: Using the Bayesian framework for FPCA analysis can more intuitively quantify uncertainty, for example, through confidence intervals. In addition, the Bayesian method can also flexibly combine domain knowledge, define prior structures, and provide a direct method for model selection. ### Simulation Studies and Applications The paper verifies the effectiveness of the proposed method through simulation studies and applications to real - data. Simulation studies show that the RB - FPCA method performs better than existing frequency methods and traditional Bayesian methods in the presence of outliers. In datasets without outliers, the performance of the RB - FPCA method is also competitive. Applications to real - data further demonstrate the effectiveness of this method in environmental and biological data and can successfully identify abnormal functional data. In conclusion, by introducing Skew Elliptical Distributions and the Bayesian framework, the paper proposes a robust FPCA method suitable for sparse functional data, which solves the shortcomings of existing methods in dealing with outliers and sparse data and provides new tools and ideas for functional data analysis.