On the Generalization Ability of Online Gradient Descent Algorithm under the Quadratic Growth Condition.

Daqing Chang,Ming Lin,Changshui Zhang
DOI: https://doi.org/10.1109/tnnls.2017.2764960
IF: 14.255
2018-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Online learning has been successfully applied in various machine learning problems. Conventional analysis of online learning achieves a sharp generalization bound with a strongly convex assumption. In this paper, we study the generalization ability of the classic online gradient descent algorithm under the quadratic growth condition (QGC), a strictly weaker condition than strong convexity. Under some mild assumptions, we prove that the excess risk converges no worse than O(log T/T) when the data are independently and identically distributed (i.i.d.). When the data are generated from a phi-mixing process, we achieve the excess risk bound O(log T/T + phi (tau)), where phi (tau) is the mixing coefficient capturing the non-i.i.d. attribute. Our key technique is based on the combination of the QGC and the martingale concentrations. Our results indicate that the strong convexity is not necessary to achieve the sharp O(log T/T) convergence rate in online learning. We verify our theories on both synthetic and real-world data.
What problem does this paper attempt to address?