Training-free Neural Architectural Search on Transformer Via Evaluating Expressivity and Trainability

Yi Fan,Yu-Bin Yang
DOI: https://doi.org/10.1109/icme57554.2024.10688064
2024-01-01
Abstract:Recently, training-free Neural Architecture Search (NAS) methods have proven to be effective and efficient in searching for Convolutional Neural Networks (CNN) architectures. However, when it comes to Transformer-based models, training-free NAS is still in its early stages, with limited methods available. Existing approaches either focus solely on language models, neglecting visual models, or overlook the core component of Transformers, attention map. In this paper, we propose a novel training-free NAS method specifically designed for Transformer-based visual models. Our method utilizes the distance between attention maps of different samples to measure expressivity, and the difference in attention maps before and after one iteration to measure trainability. Experimental results demonstrate a strong correlation between the top-1 classification accuracy on ImageNet-1K and the proxy metric used in our method. The search results achieve a top-1 classification accuracy of 82.9% on AutoFormer and 80.8% on PIT.
What problem does this paper attempt to address?