Adaptive Optimization Algorithms for Machine Learning

Slavomír Hanzely
2023-11-17
Abstract:Machine learning assumes a pivotal role in our data-driven world. The increasing scale of models and datasets necessitates quick and reliable algorithms for model training. This dissertation investigates adaptivity in machine learning optimizers. The ensuing chapters are dedicated to various facets of adaptivity, including: 1. personalization and user-specific models via personalized loss, 2. provable post-training model adaptations via meta-learning, 3. learning unknown hyperparameters in real time via hyperparameter variance reduction, 4. fast O(1/k^2) global convergence of second-order methods via stepsized Newton method regardless of the initialization and choice basis, 5. fast and scalable second-order methods via low-dimensional updates. This thesis contributes novel insights, introduces new algorithms with improved convergence guarantees, and improves analyses of popular practical algorithms.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve the adaptability problem in machine - learning optimization, specifically including the following aspects: 1. **Personalized and user - specific models**: Achieve user - specific models through personalized loss functions (Chapters 2 and 3). This involves how to adjust the model according to different users or data distributions to improve the accuracy and applicability of the model. 2. **Provably post - training model adaptation**: Achieve post - training model adaptation through meta - learning (Chapter 3). This aims to enable the model to be quickly adjusted according to new data or tasks after training. 3. **Real - time learning of unknown hyper - parameters**: Achieve real - time learning of unknown hyper - parameters through hyper - parameter variance reduction techniques (Chapter 4). This solves the problem of selecting appropriate hyper - parameters during the training process and avoids the time - consuming grid - search method. 4. **Fast global convergence of second - order methods**: Propose a globally convergent Newton method with a step size of \(O(k^{-2})\), regardless of the initial conditions and the choice of basis (Chapter 5). This improves the efficiency and robustness of second - order optimization methods. 5. **Fast and scalable second - order methods**: Achieve fast and scalable second - order methods through low - dimensional updates (Chapter 6). This makes second - order methods more efficient when dealing with large - scale problems. ### Specific problem descriptions - **Personalized and user - specific models**: - **Problem**: How to adjust the model according to different users or data distributions to improve the accuracy and applicability of the model? - **Solution**: By introducing personalized loss functions, the model can be optimized for different users or data distributions. For example, in federated learning, each user's model can be personalized trained based on their local data. - **Provably post - training model adaptation**: - **Problem**: How to quickly adjust the model according to new data or tasks after training? - **Solution**: By using meta - learning techniques, the model can be quickly adapted to new tasks after training without the need to retrain the entire model. - **Real - time learning of unknown hyper - parameters**: - **Problem**: How to automatically select appropriate hyper - parameters during the training process and avoid the time - consuming grid - search? - **Solution**: Through hyper - parameter variance reduction techniques, real - time learning of unknown hyper - parameters is achieved, thereby improving the training efficiency. - **Fast global convergence of second - order methods**: - **Problem**: How to improve the efficiency and robustness of second - order optimization methods and make them converge quickly globally? - **Solution**: Propose a globally convergent Newton method with a step size of \(O(k^{-2})\) to ensure fast convergence even in the case of poor initialization. - **Fast and scalable second - order methods**: - **Problem**: How to maintain the efficiency of second - order methods when dealing with large - scale problems? - **Solution**: Through low - dimensional update techniques, second - order methods are made more efficient when dealing with large - scale problems. In conclusion, through researching and improving adaptive optimization algorithms, this paper aims to improve the training efficiency, accuracy, and robustness of machine - learning models, especially in large - scale data and complex task scenarios.