Unconditional Quantile Regression for Streaming Datasets

Rong Jiang,Keming Yu
DOI: https://doi.org/10.1080/07350015.2023.2293162
2024-01-06
Journal of Business and Economic Statistics
Abstract:The Unconditional Quantile Regression (UQR) method, initially introduced by Firpo et al. has gained significant traction as a popular approach for modeling and analyzing data. However, much like Conditional Quantile Regression (CQR), UQR encounters computational challenges when it comes to obtaining parameter estimates for streaming datasets. This is attributed to the involvement of unknown parameters in the logistic regression loss function used in UQR, which presents obstacles in both computational execution and theoretical development. To address this, we present a novel approach involving smoothing logistic regression estimation. Subsequently, we propose a renewable estimator tailored for UQR with streaming data, relying exclusively on current data and summary statistics derived from historical data. Theoretically, our proposed estimators exhibit equivalent asymptotic properties to the standard version computed directly on the entire dataset, without any additional constraints. Both simulations and real data analysis are conducted to illustrate the finite sample performance of the proposed methods.
statistics & probability,social sciences, mathematical methods,economics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the parameter estimation difficulty of Unconditional Quantile Regression (UQR) in streaming data sets. Specifically, the paper points out that although UQR, as a popular data modeling and analysis method, performs well when dealing with static data sets, it encounters significant computational challenges when facing "big data", especially streaming data sets. These challenges mainly stem from the fact that the logistic regression loss function used in UQR involves unknown parameters, which not only increases the difficulty of computational execution but also brings theoretical development obstacles. To address this challenge, the authors propose a novel method, that is, to solve the problem through smoothing the logistic regression estimate. Subsequently, they propose a renewable estimator specifically designed for streaming data, which only depends on the current data and the summary statistics extracted from historical data. Theoretically, the proposed estimator is comparable in asymptotic properties to the standard version calculated directly on the entire data set, and no additional constraints are required. In addition, the paper also explores in detail how to estimate the unconditional quantile ($q_\tau$), the density function ($f_Y(q_\tau)$) and $\beta_{q\tau}$ on streaming data sets. By using the Taylor expansion technique, the authors solve the problem of non - smooth loss functions and develop renewable kernel density estimators and renewable quantile regression estimators. These methods not only improve the accuracy of estimation but also enable real - time updates when dealing with streaming data, thus meeting the need for fast and efficient data analysis in the big data era.