Functional principal component analysis for incomplete space–time data

Alessandro Palummo,Eleonora Arnone,Luca Formaggia,Laura M. Sangalli
DOI: https://doi.org/10.1007/s10651-024-00598-7
2024-03-17
Environmental and Ecological Statistics
Abstract:Environmental signals, acquired, e.g., by remote sensing, often present large gaps of missing observations in space and time. In this work, we present an innovative approach to identify the main variability patterns, in space–time data, when data may be affected by complex missing data structures. We formalize the problem in the framework of functional data analysis, proposing an innovative method of functional principal component analysis (fPCA) for incomplete space–time data. The functional nature of the proposed method permits to borrow information from measurements observed at nearby spatio-temporal locations. The resulting functional principal components are smooth fields over the considered spatio-temporal domain, and can lead to interesting insights in the spatio-temporal dynamic of the phenomenon under study. Moreover, they can be used to provide a reconstruction of the missing entries, also under severe missing data patterns. The proposed model combines a weighted rank-one approximation of the data matrix with a roughness penalty. We show that the estimation problem can be solved using a majorize–minimization approach, and provide a numerically efficient algorithm for its solution. Thanks to a discretization based on finite elements in space and B-splines in time, the proposed method can handle multidimensional spatial domains with complex shapes, such as water bodies with complicated shorelines, or curved spatial regions with complex orography. As shown by simulation studies, the proposed space–time fPCA is superior to alternative techniques for Principal Component Analysis with missing data. We further highlight the potentiality of the proposed method for environmental problems, by applying space–time fPCA to the study of the lake water surface temperature (LWST) of Lake Victoria, in Central Africa, starting from satellite measurements with large gaps. LWST is considered one of the fundamental indicators of how climate change is affecting the environment, and is recognized as an essential climate variable.
environmental sciences,statistics & probability,mathematics, interdisciplinary applications
What problem does this paper attempt to address?
The paper attempts to address the issue of identifying the main variability patterns of spatiotemporal signals with complex missing data structures in environmental and ecological sciences. Specifically: 1. **Complex Missing Data**: Data obtained through methods such as remote sensing often have a large number of missing values in both space and time. For example, satellite remote sensing data may have extensive data gaps due to cloud cover or meteorological conditions. 2. **Principal Component Analysis (PCA)**: Traditional PCA methods may fail or perform poorly in the presence of complex missing data. Therefore, the paper proposes a new Functional Principal Component Analysis (fPCA) method suitable for incomplete spatiotemporal data. 3. **Borrowing Information**: By leveraging measurement data from nearby spatiotemporal locations to improve the accuracy of the analysis results, this method can handle complex missing data patterns and can be used to reconstruct missing values. 4. **Application Case**: The paper demonstrates the potential application of this method using the example of Lake Victoria's Lake Water Surface Temperature (LWST) in Africa. LWST is considered an important indicator of the environmental impact of climate change. In summary, the paper aims to develop a new fPCA method to handle spatiotemporal data with complex missing patterns and to demonstrate its potential application in environmental issues.