Two-stage Neural Architecture Optimization with Separated Training and Search.
Longze He,Boyu Hou,Junwei Dong,Liang Feng
DOI: https://doi.org/10.1109/ijcnn54540.2023.10191955
2023-01-01
Abstract:Neural architecture search (NAS) has been a popular research topic for designing deep neural networks (DNNs) automatically. It is able to improve the design efficiency of neural architectures significantly for given learning tasks. Recently, instead of conducting architecture search in the original neural architecture space, many NAS approaches have been proposed to learn continuous representations from neural architectures for architecture search or estimation. In particular, Neural Architecture Optimization (NAO) is a representative method which encodes neural architectures as continuous representations by an auto-encoder and then performs continuous optimization in the encoded space with gradient-based methods. However, as NAO only considers the top-ranked architectures in learning the continuous representation, it could fail to construct a satisfied continuous optimization space which contains the expected high-quality neural architectures. Taking this cue, in this paper we propose a two-stage NAO (TNAO) to learn a more completed continuous representation of neural architectures which could provide a better optimization space for NAS. Specifically, by designing a pipeline that separates the training and search stages, we first build the training set via random sampling from the entire neural architecture search space, which is with the aim of collecting the well-distributed neural architectures for training. Moreover, to exploit the architectural semantic information with limited data effectively, we propose an improved Transformer auto-encoder for learning the continuous representation, which is supervised by ranking information of the neural architecture performance. Lastly, towards more effective optimization of neural architectures, we adopt a population-based swarm intelligence algorithm, i.e, competitive swarm optimization (CSO), with a newly designed remapping scoring scheme. To evaluate the efficiency of the proposed TNAO, comprehensive experimental studies are conducted on two common search spaces, i.e., NAS-Bench-101 and NAS-Bench-201. The architecture with the top 0.02% performance is discovered on NAS-Bench-101 and the best architecture in the CIFAR-10 dataset is obtained on NAS-Bench-201.