Yinqiao Li,Chi Hu,Yuhao Zhang,Nuo Xu,Yufan Jiang,Tong Xiao,Jingbo Zhu,Tongran Liu,Changliang Li
Abstract:Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.
What problem does this paper attempt to address?
This paper attempts to solve two main problems in Neural Architecture Search (NAS):
1. **Expanding the search space**: Most existing NAS systems are limited to learning the internal architectures of recurrent or convolutional units, ignoring the ways in which these units are connected. This limitation leads to a bottleneck in model performance. To solve this problem, the authors propose a method of expanding the search space (Extended Search Space, ESS), which not only learns the intra - cell architecture but also simultaneously learns the inter - cell architecture. In this way, the ESS method can explore a wider range of neural network structures.
2. **Jointly learning intra - and inter - unit architectures**: To obtain better search results, the authors design a joint learning method that can perform NAS both within and between units simultaneously. This method enables the intra - and inter - unit architectures to influence each other, thereby optimizing the entire neural network structure more comprehensively.
Specifically, the main contributions of the paper include:
- Proposing a general framework for learning intra - and inter - unit architectures in RNNs.
- Implementing this framework in a differentiable architecture search system and achieving results significantly better than the baseline model on the language modeling task.
- Demonstrating the good transferability of the learned architectures and also achieving new best results on the named entity recognition (NER) and chunking tasks.
### Summary of mathematical formulas
1. **RNN unit output formula**:
\[
h_t=\pi(\hat{h}_{t - 1},\hat{x}_t)
\]
where \(\pi(\cdot)\) is the unit function, \(\hat{h}_{t - 1}\) is the representation vector at the previous time step, and \(\hat{x}_t\) is the representation vector of the current input.
2. **Node state formula**:
\[
s_i=\sum_{j < i}\sum_k\theta_{i,j}^k\cdot o_{i,j}^k(s_j\cdot W_j)
\]
where \(W_j\) is the parameter matrix of the linear transformation, \(\theta_{i,j}^k\) is the weight of the operation \(o_{i,j}^k(\cdot)\), which is obtained by softmax normalization:
\[
\theta_{i,j}^k=\frac{\exp(w_{i,j}^k)}{\sum_{k'}\exp(w_{i,j}^{k'})}
\]
3. **Final output formula**:
\[
s_n=\frac{1}{n - 1}\sum_{i = 1}^{n - 1}s_i
\]
4. **Hadamard product formula**:
\[
F(\alpha;\beta)=s_\alpha\odot s_\beta
\]
These formulas show how to perform neural architecture search in the extended search space and optimize the intra - and inter - unit connections through the joint learning method.