Efficient Strongly Polynomial Algorithms for Quantile Regression

Suraj Shetiya,Shohedul Hasan,Abolfazl Asudeh,Gautam Das
DOI: https://doi.org/10.48550/arXiv.2307.08706
2023-07-14
Abstract:Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In this paper, we revisit the classical technique of Quantile Regression (QR), which is statistically a more robust alternative to the other classical technique of Ordinary Least Square Regression (OLS). However, while there exist efficient algorithms for OLS, almost all of the known results for QR are only weakly polynomial. Towards filling this gap, this paper proposes several efficient strongly polynomial algorithms for QR for various settings. For two dimensional QR, making a connection to the geometric concept of $k$-set, we propose an algorithm with a deterministic worst-case time complexity of $\mathcal{O}(n^{4/3} polylog(n))$ and an expected time complexity of $\mathcal{O}(n^{4/3})$ for the randomized version. We also propose a randomized divide-and-conquer algorithm -- RandomizedQR with an expected time complexity of $\mathcal{O}(n\log^2{(n)})$ for two dimensional QR problem. For the general case with more than two dimensions, our RandomizedQR algorithm has an expected time complexity of $\mathcal{O}(n^{d-1}\log^2{(n)})$.
Computational Geometry,Data Structures and Algorithms,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the efficiency of the Quantile Regression (QR) algorithm in terms of computational complexity, especially to design highly efficient strongly polynomial - time algorithms. Currently, for the QR problem, the state - of - the - art algorithms need to solve large - scale linear programming problems through interior - point methods, and these methods are weakly polynomial - time. The goal of the paper is to fill this gap and propose highly efficient strongly polynomial - time algorithms applicable to different settings to reduce the computational complexity of the QR problem. Specifically, the main contributions of the paper include: 1. **Utilizing computational geometry concepts**: The paper uses the concepts of arrangement and duality to map the QR problem through a dual transformation to a problem of finding the intersection point of the optimization objective function in a hyperplane arrangement. To assist this traversal process, a sub - program named `UpdateNeighbor` is designed, which can update the calculation of the objective function from neighboring points with a time and space complexity of O(d). This enables the running time of the naive baseline QR algorithm to be effectively improved. 2. **Connecting the QR problem with the k - set concept**: By relating the QR problem to the geometric k - set concept, the algorithm in the paper can limit its traversal within the k - th layer of the arrangement. For the two - dimensional QR problem, an efficient algorithm named `QReg2D` is proposed. The deterministic time complexity is O(n^(4/3) log^(1 + a)(n)), and the expected time complexity of the randomized version is O(n^(4/3)). These results are asymptotically better than the existing exterior - point or interior - point methods. 3. **Proposing the randomized algorithm `RandomizedQR`**: For the general d - dimensional QR problem, the paper proposes an algorithm `RandomizedQR` based on a randomized divide - and - conquer strategy. This algorithm divides the search space by randomly selecting hyperplanes to determine the half - space containing the optimal solution. The expected time complexity of `RandomizedQR` for the two - dimensional problem is O(n log^2(n)), and for higher dimensions it is O(d n^(d - 1) log^2(n)), which is faster than the known deterministic strongly polynomial algorithms. In summary, the main objective of the paper is to significantly improve the computational efficiency of the QR problem, especially the processing ability on high - dimensional data, through designing new algorithm frameworks and techniques.