Neural Architecture Selection as a Nash Equilibrium With Batch Entanglement

Qian Li,Chao Xue,Mingming Li,Chun-Guang Li,Chao Ma,Xiaokang Yang
DOI: https://doi.org/10.1109/TNNLS.2023.3283239
Abstract:Modeling the architecture search process on a supernet and applying a differentiable method to find the importance of architecture are among the leading tools for differentiable neural architectures search (DARTS). One fundamental problem in DARTS is how to discretize or select a single-path architecture from the pretrained one-shot architecture. Previous approaches mainly exploit heuristic or progressive search methods for discretization and selection, which are not efficient and easily trapped by local optimizations. To address these issues, we formulate the task of finding a proper single-path architecture as an architecture game among the edges and operations with the strategies "keep" and "drop" and show that the optimal one-shot architecture is a Nash equilibrium of the architecture game. Then, we propose a novel and effective approach for discretizing and selecting a proper single-path architecture, which is based on extracting the single-path architecture that associates the maximal coefficient of the Nash equilibrium with the strategy "keep" in the architecture game. To further improve the efficiency, we employ a mechanism of entangled Gaussian representation of mini-batches, inspired by the classic Parrondo's paradox. If some mini-batch formed uncompetitive strategies, the entanglement of mini-batches would ensure the games be combined and, thus, turn into strong ones. We conduct extensive experiments on benchmark datasets and demonstrate that our approach is significantly faster than the state-of-the-art progressive discretizing methods while maintaining competitive performance with higher maximum accuracy.
What problem does this paper attempt to address?