Solving systems of Random Equations via First and Second-Order Optimization Algorithms

Andrea Montanari,Eliran Subag
2024-12-10
Abstract:Gradient-based (a.k.a. `first order') optimization algorithms are routinely used to solve large scale non-convex problems. Yet, it is generally hard to predict their effectiveness. In order to gain insight into this question, we revisit the problem of solving $n$ random equations in $d$ variables. We assume that the equations are independent realizations of a common Gaussian process. A special case is the one of random polynomials, which has been studied since Littlewood-Offord and Kac in the 1940s, and Shub-Smale in the 1990s. The last authors first investigated the computational aspect of this problem. Smale's `17th problem' asks whether a system of random polynomial equations can be (approximately) solved in average-case polynomial time. We formulate this as a nonconvex optimization problem, and develop gradient and Hessian-based algorithms to solve it. Leveraging recent advances in spin glass theory, we characterize the optimal algorithm in this class, and show that it undergoes a phase transition when $\alpha=n/d$ crosses a threshold. For $\alpha>\alpha_{\text{alg}}$ solutions may exist (depending on the distribution of the equations) but are not found by local algorithms. We compare these predictions with numerical experiments and observe that stochastic gradient descent approaches the optimal algorithm. We show that the geometry of solutions in a neighborhood of a random initialization undergoes a phase transition when $\alpha$ crosses a threshold $\alpha_{\text{sens}}$ (for `sensitivity') smaller than $\alpha_{\text{alg}}$. This geometric phase transition has algorithmic implications. Finally, we observe that the dynamics of these algorithms exhibits remarkable universality with respect to the details of the cost function.
Probability
What problem does this paper attempt to address?