Learning Architectures from an Extended Search Space for Language Modeling

Yinqiao Li,Chi Hu,Yuhao Zhang,Nuo Xu,Yufan Jiang,Tong Xiao,Jingbo Zhu,Tongran Liu,Changliang Li
DOI: https://doi.org/10.48550/arXiv.2005.02593
2020-06-05
Abstract:Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
This paper attempts to solve two main problems in Neural Architecture Search (NAS): 1. **Expanding the search space**: Most existing NAS systems are limited to learning the internal architectures of recurrent or convolutional units, ignoring the ways in which these units are connected. This limitation leads to a bottleneck in model performance. To solve this problem, the authors propose a method of expanding the search space (Extended Search Space, ESS), which not only learns the intra - cell architecture but also simultaneously learns the inter - cell architecture. In this way, the ESS method can explore a wider range of neural network structures. 2. **Jointly learning intra - and inter - unit architectures**: To obtain better search results, the authors design a joint learning method that can perform NAS both within and between units simultaneously. This method enables the intra - and inter - unit architectures to influence each other, thereby optimizing the entire neural network structure more comprehensively. Specifically, the main contributions of the paper include: - Proposing a general framework for learning intra - and inter - unit architectures in RNNs. - Implementing this framework in a differentiable architecture search system and achieving results significantly better than the baseline model on the language modeling task. - Demonstrating the good transferability of the learned architectures and also achieving new best results on the named entity recognition (NER) and chunking tasks. ### Summary of mathematical formulas 1. **RNN unit output formula**: \[ h_t=\pi(\hat{h}_{t - 1},\hat{x}_t) \] where \(\pi(\cdot)\) is the unit function, \(\hat{h}_{t - 1}\) is the representation vector at the previous time step, and \(\hat{x}_t\) is the representation vector of the current input. 2. **Node state formula**: \[ s_i=\sum_{j < i}\sum_k\theta_{i,j}^k\cdot o_{i,j}^k(s_j\cdot W_j) \] where \(W_j\) is the parameter matrix of the linear transformation, \(\theta_{i,j}^k\) is the weight of the operation \(o_{i,j}^k(\cdot)\), which is obtained by softmax normalization: \[ \theta_{i,j}^k=\frac{\exp(w_{i,j}^k)}{\sum_{k'}\exp(w_{i,j}^{k'})} \] 3. **Final output formula**: \[ s_n=\frac{1}{n - 1}\sum_{i = 1}^{n - 1}s_i \] 4. **Hadamard product formula**: \[ F(\alpha;\beta)=s_\alpha\odot s_\beta \] These formulas show how to perform neural architecture search in the extended search space and optimize the intra - and inter - unit connections through the joint learning method.