Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model

Sosuke Kobayashi,Shun Kiyono,Jun Suzuki,Kentaro Inui
DOI: https://doi.org/10.48550/arXiv.2205.11833
2022-05-24
Abstract:Ensembling is a popular method used to improve performance as a last resort. However, ensembling multiple models finetuned from a single pretrained model has been not very effective; this could be due to the lack of diversity among ensemble members. This paper proposes Multi-Ticket Ensemble, which finetunes different subnetworks of a single pretrained model and ensembles them. We empirically demonstrated that winning-ticket subnetworks produced more diverse predictions than dense networks, and their ensemble outperformed the standard ensemble on some tasks.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to construct an effective ensemble model from a single pre - trained model. Specifically, the existing methods are to perform multiple fine - tunings from the same pre - trained model, but there is a lack of diversity among these fine - tuned models, resulting in poor ensemble effects. To solve this problem, the paper proposes a new method named Multi - Ticket Ensemble. By fine - tuning different subnetworks in the pre - trained model, the diversity between models is increased, thereby improving the performance of the ensemble model. ### Background and Challenges of the Main Problem 1. **Effectiveness of Ensemble Learning**: - Ensemble learning is a method to improve prediction performance by combining the outputs of multiple models. - An ideal ensemble model requires sufficient diversity among member models so that they can provide complementary predictions on different data samples. 2. **Limitations of a Single Pre - trained Model**: - Due to the extremely high cost of large - scale pre - training, most researchers can only use a single pre - trained model provided by resource - rich organizations. - Multiple models fine - tuned from the same pre - trained model tend to have high similarity and lack diversity, so the ensemble effect is limited. 3. **Sources of Diversity**: - The paper points out that traditional fine - tuning methods (such as randomly initializing task - specific layers, shuffling datasets, Dropout, etc.) cannot significantly increase the diversity of models. - Therefore, new methods need to be introduced to increase the diversity between models while maintaining the accuracy of each sub - model. ### Proposed Solution The paper proposes the Multi - Ticket Ensemble method. Its core idea is to find the "winning tickets" in the pre - trained model through iterative magnitude pruning, that is, those sparse subnetworks that can still maintain high accuracy after fine - tuning. The specific steps are as follows: 1. **Iterative Magnitude Pruning**: - Starting from the pre - trained model, gradually prune the parameters with the smallest absolute values of weights to form different subnetworks. - After each pruning, re - fine - tune the remaining subnetworks until a predetermined pruning ratio (for example, 30%) is reached. 2. **Diversity and Accuracy of Subnetworks**: - Different subnetworks will utilize different sub - spaces of pre - trained knowledge during the fine - tuning process, thus obtaining different perspectives and increasing diversity. - At the same time, according to the "lottery hypothesis", these subnetworks can still maintain relatively high accuracy after fine - tuning. ### Experimental Results The paper proves through experiments that the subnetworks generated by the Multi - Ticket Ensemble method are more diverse than traditional dense networks, and its ensemble model performs better than traditional ensemble methods on some tasks. ### Formula Representation - Let \( f(x; \theta) \) represent the output of the model for input \( x \) under parameter \( \theta \). - The output of the ensemble model is: \[ f_M(x)=\frac{1}{|M|} \sum_{\theta \in M} f(x; \theta) \] where \( M =\{\theta_1, \theta_2,\ldots,\theta_{|M|}\} \) is the parameter set of member models. Through this method, the paper effectively solves the problem of constructing an efficient ensemble model from a single pre - trained model and improves the diversity and overall performance of the model.