On $O(n)$ Algorithms for Projection onto the Top-$k$-sum Sublevel Set

Jake Roth,Ying Cui
2024-08-14
Abstract:The \emph{top-$k$-sum} operator computes the sum of the largest $k$ components of a given vector. The Euclidean projection onto the top-$k$-sum sublevel set serves as a crucial subroutine in iterative methods to solve composite superquantile optimization problems. In this paper, we introduce a solver that implements two finite-termination algorithms to compute this projection. Both algorithms have $O(n)$ complexity of floating point operations when applied to a sorted $n$-dimensional input vector, where the absorbed constant is \emph{independent of $k$}. This stands in contrast to an existing grid-search-inspired method that has $O(k(n-k))$ complexity, a partition-based method with $O(n+D\log D)$ complexity, where $D\leq n$ is the number of distinct elements in the input vector, and a semismooth Newon method with a finite termination property but unspecified floating point complexity. The improvement of our methods over the first method is significant when $k$ is linearly dependent on $n$, which is frequently encountered in practical superquantile optimization applications. In instances where the input vector is unsorted, an additional cost is incurred to (partially) sort the vector, whereas a full sort of the input vector seems unavoidable for the other two methods. To reduce this cost, we further derive a rigorous procedure that leverages approximate sorting to compute the projection, which is particularly useful when solving a sequence of similar projection problems. Numerical results show that our methods solve problems of scale $n=10^7$ and $k=10^4$ within $0.05$ seconds, whereas the most competitive alternative, the semismooth Newton-based method, takes about $1$ second. The existing grid-search method and Gurobi's QP solver can take from minutes to hours.
Optimization and Control,Numerical Analysis
What problem does this paper attempt to address?