Abstract:Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.

What problem does this paper attempt to address?

This paper attempts to solve two main problems in Neural Architecture Search (NAS): 1. **Expanding the search space**: Most existing NAS systems are limited to learning the internal architectures of recurrent or convolutional units, ignoring the ways in which these units are connected. This limitation leads to a bottleneck in model performance. To solve this problem, the authors propose a method of expanding the search space (Extended Search Space, ESS), which not only learns the intra - cell architecture but also simultaneously learns the inter - cell architecture. In this way, the ESS method can explore a wider range of neural network structures. 2. **Jointly learning intra - and inter - unit architectures**: To obtain better search results, the authors design a joint learning method that can perform NAS both within and between units simultaneously. This method enables the intra - and inter - unit architectures to influence each other, thereby optimizing the entire neural network structure more comprehensively. Specifically, the main contributions of the paper include: - Proposing a general framework for learning intra - and inter - unit architectures in RNNs. - Implementing this framework in a differentiable architecture search system and achieving results significantly better than the baseline model on the language modeling task. - Demonstrating the good transferability of the learned architectures and also achieving new best results on the named entity recognition (NER) and chunking tasks. ### Summary of mathematical formulas 1. **RNN unit output formula**: \[ h_t=\pi(\hat{h}_{t - 1},\hat{x}_t) \] where \(\pi(\cdot)\) is the unit function, \(\hat{h}_{t - 1}\) is the representation vector at the previous time step, and \(\hat{x}_t\) is the representation vector of the current input. 2. **Node state formula**: \[ s_i=\sum_{j < i}\sum_k\theta_{i,j}^k\cdot o_{i,j}^k(s_j\cdot W_j) \] where \(W_j\) is the parameter matrix of the linear transformation, \(\theta_{i,j}^k\) is the weight of the operation \(o_{i,j}^k(\cdot)\), which is obtained by softmax normalization: \[ \theta_{i,j}^k=\frac{\exp(w_{i,j}^k)}{\sum_{k'}\exp(w_{i,j}^{k'})} \] 3. **Final output formula**: \[ s_n=\frac{1}{n - 1}\sum_{i = 1}^{n - 1}s_i \] 4. **Hadamard product formula**: \[ F(\alpha;\beta)=s_\alpha\odot s_\beta \] These formulas show how to perform neural architecture search in the extended search space and optimize the intra - and inter - unit connections through the joint learning method.

Learning Architectures from an Extended Search Space for Language Modeling

Understanding Architectures Learnt by Cell-based Neural Architecture Search

einspace: Searching for Neural Architectures from Fundamental Operations

Continual and Multi-Task Architecture Search

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

TextNAS: A Neural Architecture Search Space Tailored for Text Representation.

Efficient Gradient-Based Neural Architecture Search for End-to-End ASR

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search

Semi-Supervised Neural Architecture Search.

Searching Better Architectures for Neural Machine Translation

Advances in neural architecture search

Task-Aware Neural Architecture Search

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

Neural Architecture Search in Embedding Space

Exploring Neural Architecture Search for Text Classification

Exploring Shared Structures and Hierarchies for Multiple NLP Tasks

Design Principle Transfer in Neural Architecture Search via Large Language Models

Neural Architecture Search on Efficient Transformers and Beyond

Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search for End-to-End ASR

Deep Multimodal Neural Architecture Search