Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions.

Jingzhao Zhang,Hongzhou Lin,Stefanie Jegelka,Ali Jadbabaie,Suvrit Sra
DOI: https://doi.org/10.48550/arxiv.2002.04130
2020-01-01
Abstract:We provide the first non-asymptotic analysis for finding stationary points ofnonsmooth, nonconvex functions. In particular, we study the class of Hadamardsemi-differentiable functions, perhaps the largest class of nonsmooth functionsfor which the chain rule of calculus holds. This class contains examples suchas ReLU neural networks and others with non-differentiable activationfunctions. We first show that finding an ϵ-stationary point withfirst-order methods is impossible in finite time. We then introduce the notionof (δ, ϵ)-stationarity, which allows for anϵ-approximate gradient to be the convex combination of generalizedgradients evaluated at points within distance δ to the solution. Wepropose a series of randomized first-order methods and analyze their complexityof finding a (δ, ϵ)-stationary point. Furthermore, we provide alower bound and show that our stochastic algorithm has min-max optimaldependence on δ. Empirically, our methods perform well for training ReLUneural networks.
What problem does this paper attempt to address?