Abstract:Machine learning assumes a pivotal role in our data-driven world. The increasing scale of models and datasets necessitates quick and reliable algorithms for model training. This dissertation investigates adaptivity in machine learning optimizers. The ensuing chapters are dedicated to various facets of adaptivity, including: 1. personalization and user-specific models via personalized loss, 2. provable post-training model adaptations via meta-learning, 3. learning unknown hyperparameters in real time via hyperparameter variance reduction, 4. fast O(1/k^2) global convergence of second-order methods via stepsized Newton method regardless of the initialization and choice basis, 5. fast and scalable second-order methods via low-dimensional updates. This thesis contributes novel insights, introduces new algorithms with improved convergence guarantees, and improves analyses of popular practical algorithms.

What problem does this paper attempt to address?

This paper attempts to solve the adaptability problem in machine - learning optimization, specifically including the following aspects: 1. **Personalized and user - specific models**: Achieve user - specific models through personalized loss functions (Chapters 2 and 3). This involves how to adjust the model according to different users or data distributions to improve the accuracy and applicability of the model. 2. **Provably post - training model adaptation**: Achieve post - training model adaptation through meta - learning (Chapter 3). This aims to enable the model to be quickly adjusted according to new data or tasks after training. 3. **Real - time learning of unknown hyper - parameters**: Achieve real - time learning of unknown hyper - parameters through hyper - parameter variance reduction techniques (Chapter 4). This solves the problem of selecting appropriate hyper - parameters during the training process and avoids the time - consuming grid - search method. 4. **Fast global convergence of second - order methods**: Propose a globally convergent Newton method with a step size of \(O(k^{-2})\), regardless of the initial conditions and the choice of basis (Chapter 5). This improves the efficiency and robustness of second - order optimization methods. 5. **Fast and scalable second - order methods**: Achieve fast and scalable second - order methods through low - dimensional updates (Chapter 6). This makes second - order methods more efficient when dealing with large - scale problems. ### Specific problem descriptions - **Personalized and user - specific models**: - **Problem**: How to adjust the model according to different users or data distributions to improve the accuracy and applicability of the model? - **Solution**: By introducing personalized loss functions, the model can be optimized for different users or data distributions. For example, in federated learning, each user's model can be personalized trained based on their local data. - **Provably post - training model adaptation**: - **Problem**: How to quickly adjust the model according to new data or tasks after training? - **Solution**: By using meta - learning techniques, the model can be quickly adapted to new tasks after training without the need to retrain the entire model. - **Real - time learning of unknown hyper - parameters**: - **Problem**: How to automatically select appropriate hyper - parameters during the training process and avoid the time - consuming grid - search? - **Solution**: Through hyper - parameter variance reduction techniques, real - time learning of unknown hyper - parameters is achieved, thereby improving the training efficiency. - **Fast global convergence of second - order methods**: - **Problem**: How to improve the efficiency and robustness of second - order optimization methods and make them converge quickly globally? - **Solution**: Propose a globally convergent Newton method with a step size of \(O(k^{-2})\) to ensure fast convergence even in the case of poor initialization. - **Fast and scalable second - order methods**: - **Problem**: How to maintain the efficiency of second - order methods when dealing with large - scale problems? - **Solution**: Through low - dimensional update techniques, second - order methods are made more efficient when dealing with large - scale problems. In conclusion, through researching and improving adaptive optimization algorithms, this paper aims to improve the training efficiency, accuracy, and robustness of machine - learning models, especially in large - scale data and complex task scenarios.

Adaptive Optimization Algorithms for Machine Learning

Adaptive Strategies in Non-convex Optimization

A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Adaptive Optimizer for Automated Hyperparameter Optimization Problem

Differentiable Self-Adaptive Learning Rate

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

Dynamic Memory Based Adaptive Optimization

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Adaptive Gradient-Based Meta-Learning Methods

Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

Learning to optimize with convergence guarantees using nonlinear system theory

Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

Machine Learning Optimization Algorithms & Portfolio Allocation

Practical tradeoffs between memory, compute, and performance in learned optimizers

Improving Adaptive Online Learning Using Refined Discretization

Optimal Adaptive and Accelerated Stochastic Gradient Descent

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions

Fast Adaptation with Kernel and Gradient based Meta Leaning