Convergence Analysis of Fractional Gradient Descent

Ashwani Aggarwal
2024-06-04
Abstract:Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove linear convergence for smooth and strongly convex functions and $O(1/T)$ convergence for smooth and convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness - Hölder smoothness - that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as some preliminary theoretical results explaining this speed up.
Optimization and Control,Machine Learning,Numerical Analysis
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily explores the convergence properties of the Fractional Gradient Descent method in different optimization scenarios. Specifically: 1. **Relationship between fractional-order derivatives and integer-order derivatives**: - The study establishes new inequalities between fractional-order derivatives and integer-order derivatives, which help in understanding the application of fractional-order derivatives in optimization. 2. **Linear convergence rate for smooth and strongly convex functions**: - For smooth and strongly convex functions, the paper proves that the Fractional Gradient Descent method can achieve linear convergence and provides a detailed convergence rate analysis. This analysis extends the work of Shin et al. (2021), which was limited to quadratic functions. 3. **O(1/T) convergence rate for smooth and convex functions**: - For smooth and convex functions, the paper demonstrates that the Fractional Gradient Descent method can achieve an O(1/T) convergence rate, similar to the standard Gradient Descent method. 4. **O(1/T) convergence rate for smooth but non-convex functions**: - For smooth but non-convex functions, the paper introduces the concept of Hölder smoothness and proves that the Fractional Gradient Descent method can achieve an O(1/T) convergence rate. 5. **Experimental results**: - The paper presents experimental evidence showing that the Fractional Gradient Descent method converges faster than the standard Gradient Descent method in certain cases and provides preliminary explanations for this acceleration. Through these studies, the paper aims to fill the current theoretical analysis gap of the Fractional Gradient Descent method and demonstrate its potential in practical applications.