Adaptive Huber Regression on Markov-dependent Data

Jianqing Fan,Yongyi Guo,Bai Jiang
2019-09-24
Abstract:High-dimensional linear regression has been intensively studied in the community of statistics in the last two decades. For the convenience of theoretical analyses, classical methods usually assume independent observations and sub-Gaussian-tailed errors. However, neither of them hold in many real high-dimensional time-series data. Recently [Sun, Zhou, Fan, 2019, J. Amer. Stat. Assoc., in press] proposed Adaptive Huber Regression (AHR) to address the issue of heavy-tailed errors. They discover that the robustification parameter of the Huber loss should adapt to the sample size, the dimensionality, and the moments of the heavy-tailed errors. We progress in a vertical direction and justify AHR on dependent observations. Specifically, we consider an important dependence structure -- Markov dependence. Our results show that the Markov dependence impacts on the adaption of the robustification parameter and the estimation of regression coefficients in the way that the sample size should be discounted by a factor depending on the spectral gap of the underlying Markov chain.
Methodology
What problem does this paper attempt to address?
This paper mainly discusses how to use Adaptive Huber Regression (AHR) to deal with heavy-tailed errors when handling data with Markovian dependence in high-dimensional linear regression. Traditional high-dimensional linear regression methods typically assume independent observations and errors following sub-Gaussian distribution, but this does not hold true in many practical high-dimensional time series data. AHR, proposed by Sun, Zhou, and Fan in 2019, is used to address the problem of heavy-tailed errors, and they found that the robust parameter of Huber loss should adapt to sample size, dimensionality, and moments of heavy-tailed errors. The paper points out that in addition to heavy-tailed errors, another common characteristic of high-dimensional data is the dependence of observations, especially in time series data such as functional magnetic resonance imaging (fMRI) data and macroeconomic data. Although AHR performs well in independent settings, its applicability in handling dependent data is not clear. Therefore, the paper focuses on studying the performance of AHR on Markovian dependent data. The paper presents a key result that, under certain conditions, the robust parameter of AHR should be adjusted with respect to sample size, dimensionality, the degree of heavy-tailedness of errors, and the dependence of the Markov chain. Specifically, if the errors have finite (1+δ)-th moment and the Markov chain has a non-zero spectral gap, choosing an appropriate τ parameter can achieve the optimal trade-off between bias and robustness. The paper also provides error rate bounds for AHR in Markovian dependence settings and demonstrates how dependence affects the error rate of the AHR estimator. Overall, this paper aims to bridge the gap between high-dimensional regression theory and the handling of heavy-tailed errors and dependent observations in real data, providing a theoretical basis for understanding and applying the performance of AHR on complex datasets.