Efficient Covariance Estimation from Temporal Data

Hrayr Harutyunyan,Daniel Moyer,Hrant Khachatrian,Greg Ver Steeg,Aram Galstyan
DOI: https://doi.org/10.48550/arXiv.1905.13276
2021-02-11
Abstract:Estimating the covariance structure of multivariate time series is a fundamental problem with a wide-range of real-world applications -- from financial modeling to fMRI analysis. Despite significant recent advances, current state-of-the-art methods are still severely limited in terms of scalability, and do not work well in high-dimensional undersampled regimes. In this work we propose a novel method called Temporal Correlation Explanation, or T-CorEx, that (a) has linear time and memory complexity with respect to the number of variables, and can scale to very large temporal datasets that are not tractable with existing methods; (b) gives state-of-the-art results in highly undersampled regimes on both synthetic and real-world datasets; and (c) makes minimal assumptions about the character of the dynamics of the system. T-CorEx optimizes an information-theoretic objective function to learn a latent factor graphical model for each time period and applies two regularization techniques to induce temporal consistency of estimates. We perform extensive evaluation of T-Corex using both synthetic and real-world data and demonstrate that it can be used for detecting sudden changes in the underlying covariance matrix, capturing transient correlations and analyzing extremely high-dimensional complex multivariate time series such as high-resolution fMRI data.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the **covariance estimation problem of multivariate time series**, especially in the case of high - dimensional undersampling. Specifically, the authors point out that the current state - of - the - art methods have serious scalability limitations when dealing with large - scale time data and perform poorly in high - dimensional undersampling situations. Therefore, they propose a new method - **T - CorEx (Temporal Correlation Explanation)** to address the following challenges: 1. **Scalability of high - dimensional data**: Existing methods have too high computational complexity and memory requirements when dealing with a large number of variables and cannot effectively handle very large time data sets. 2. **High - dimensional undersampling problem**: When the number of samples is relatively small compared to the number of variables, the performance of existing methods will decline significantly. 3. **Temporal consistency**: Samples between different time steps are usually not independently and identically distributed, and the dynamic characteristics of the system may be complex and difficult to model. ### Main contributions of T - CorEx - **Linear time and memory complexity**: T - CorEx has linear time and memory complexity in terms of the number of variables and can handle several orders of magnitude more variables than existing methods. - **Temporal consistency regularization**: Two regularization techniques are introduced to ensure that the estimation results between different time steps are consistent. - **Applicable to multiple application scenarios**: Experiments show that T - CorEx can not only detect sudden changes in the underlying covariance matrix and capture transient correlations, but also analyze extremely high - dimensional complex multivariate time series, such as high - resolution fMRI data. ### Method overview T - CorEx is based on the linear CorEx model. It learns an approximate modular latent factor model for each time period by minimizing an information - theoretic objective function and ensures temporal consistency through regularization techniques. The specific steps include: 1. **Linear CorEx model**: For a given multivariate Gaussian random variable \(X\), the algorithm finds an \(m\)-dimensional Gaussian random variable \(Z\) such that the joint distribution \(p(x, z)\) is close to modular. 2. **Temporal consistency regularization**: Ensure that the covariance matrix estimation results of adjacent time periods are similar by optimizing the regularization term in the problem. 3. **Sample weight adjustment**: To reduce noise, use samples from other time periods to weight and estimate certain parameters in the covariance matrix. ### Experimental results The authors verified the performance of T - CorEx through synthetic data and real - world data (such as stock market data, fMRI data). The results show that T - CorEx is superior to existing methods in high - dimensional undersampling situations and can operate efficiently on large - scale data sets. In summary, the main goal of this paper is to develop an efficient covariance estimation method to deal with the scalability and undersampling problems in high - dimensional time - series data and prove its superiority through experiments.