Tractability from overparametrization: the example of the negative perceptron

Andrea Montanari,Yiqiao Zhong,Kangjie Zhou
DOI: https://doi.org/10.1007/s00440-023-01248-y
IF: 1.944
2024-01-24
Probability Theory and Related Fields
Abstract:In the negative perceptron problem we are given n data points , where is a d -dimensional vector and is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector that maximizes . This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which with , and prove upper and lower bounds on the maximum margin or—equivalently—on its inverse function . In other words, is the overparametrization threshold: for a classifier achieving vanishing training error exists with high probability, while for it does not. Our bounds on match to the leading order as . We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold . We observe a gap between the interpolation threshold and the linear programming threshold , raising the question of the behavior of other algorithms.
statistics & probability
What problem does this paper attempt to address?