Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation

Li Ding,Masrour Zoghi,Guy Tennenholtz,Maryam Karimzadehgan
2023-12-14
Abstract:We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol. In each iteration of EV3, we explore various model parameter updates, assess them using pertinent evaluation methods, and then adapt the model based on the optimal updates and previous progress history. EV3 offers substantial flexibility without imposing stringent constraints like differentiability on the key objectives relevant to the tasks of interest, allowing for exploratory updates with intentionally-biased gradients and through a diversity of losses and optimizers. Additionally, the assessment phase provides reliable safety controls to ensure robust generalization, and can dynamically prioritize tasks in scenarios with multiple objectives. With inspiration drawn from evolutionary algorithms, meta-learning, and neural architecture search, we investigate an application of EV3 to knowledge distillation. Our experimental results illustrate EV3's capability to safely explore the modeling landscape, while hinting at its potential applicability across numerous domains due to its inherent flexibility and adaptability. Finally, we provide a JAX implementation of EV3, along with source code for experiments, available at: <a class="link-external link-https" href="https://github.com/google-research/google-research/tree/master/ev3" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Neural and Evolutionary Computing
What problem does this paper attempt to address?
This paper aims to solve several key problems in machine - learning model training by introducing a new meta - optimization framework named Ever Evolving Evaluator (EV3). Specifically, EV3 attempts to solve the following problems: 1. **Flexibility and Adaptability**: Existing optimization methods are often strictly constrained when dealing with different tasks, such as requiring the objective function to be differentiable. Through the explore - assess - adapt protocol, EV3 provides a more flexible method that can efficiently train machine - learning models without imposing these strict limitations. This means that EV3 can handle non - differentiable objective functions and allows the use of exploration updates with biased gradients. 2. **Multi - objective Optimization**: When dealing with tasks with multiple objectives, how to dynamically prioritize certain evaluation metrics is a challenge. EV3 ensures the robust generalization ability of the model through the reliable safety control provided in its assessment phase and can dynamically adjust the priorities of tasks according to specific situations. 3. **Optimization of Knowledge Distillation**: In particular, the paper explores the application of EV3 in Knowledge Distillation (KD). Knowledge Distillation is an effective method for model compression and acceleration, but traditional methods may rely on label data for training, thereby increasing the risk of over - fitting. EV3 mitigates this risk by using label data for verification and reduces the possibility of over - fitting through the separation of exploration and assessment. 4. **Model Expansion Strategy**: When updating parameters alone cannot improve performance, EV3 also provides the ability to expand the model. Through Network Morphism, EV3 can increase the capacity of the model while retaining the performance of the existing model to further improve performance. 5. **Co - training Strategy**: For knowledge distillation of smaller - scale models, the paper proposes a co - training strategy called "EV3 - Synergy". This strategy improves the performance of small - scale models by extracting knowledge from multiple models of different sizes, generating a diverse set of models, and then using these models for further training. In summary, the EV3 framework, through its unique design, not only improves the flexibility and adaptability of model training but also shows significant advantages in specific application scenarios such as knowledge distillation.