Abstract:Graph neural networks (GNN) has been successfully applied to operate on the graph-structured data. Given a specific scenario, rich human expertise and tremendous laborious trials are usually required to identify a suitable GNN architecture. It is because the performance of a GNN architecture is significantly affected by the choice of graph convolution components, such as aggregate function and hidden dimension. Neural architecture search (NAS) has shown its potential in discovering effective deep architectures for learning tasks in image and language modeling. However, existing NAS algorithms cannot be directly applied to the GNN search problem. First, the search space of GNN is different from the ones in existing NAS work. Second, the representation learning capacity of GNN architecture changes obviously with slight architecture modifications. It affects the search efficiency of traditional search methods. Third, widely used techniques in NAS such as parameter sharing might become unstable in GNN.
To bridge the gap, we propose the automated graph neural networks (AGNN) framework, which aims to find an optimal GNN architecture within a predefined search space. A reinforcement learning based controller is designed to greedily validate architectures via small steps. AGNN has a novel parameter sharing strategy that enables homogeneous architectures to share parameters, based on a carefully-designed homogeneity definition. Experiments on real-world benchmark datasets demonstrate that the GNN architecture identified by AGNN achieves the best performance, comparing with existing handcrafted models and tradistional search methods.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to automatically find the optimal graph neural network (GNN) architecture applicable to graph - structured data**. Specifically, the paper aims to address the following challenges:
1. **Differences in the search space**:
- Compared with traditional convolutional neural networks (CNNs), the search space of GNNs is more complex. For example, in GNNs, the message - passing mechanism involves operations such as aggregation, combination, and activation, rather than just the selection of convolution kernel sizes.
2. **The impact of architecture modification on the representation learning ability**:
- The performance of GNN architectures is very sensitive to minor modifications of the architecture. For example, changing the aggregation function (such as from max - pooling to summation) may significantly affect the classification performance. This makes it difficult for traditional controllers to effectively learn which modifications have a positive impact on performance.
3. **Instability of parameter sharing**:
- In GNNs, directly applying traditional parameter - sharing techniques (such as sharing weights between different architectures) may lead to training instability. This is because the weight shapes and output statistical characteristics of different architectures may be different, and directly sharing parameters may lead to output explosion or training instability.
To solve these problems, the paper proposes a framework named **Automated Graph Neural Network (AGNN)**, and its main contributions include:
- **Defining the neural architecture search problem applicable to GNNs** and designing a more efficient controller to explore the search space of GNN architectures.
- **Proposing a conservative exploration strategy** to efficiently search for new architectures by gradually modifying certain components in the existing best architecture.
- **Introducing a restricted parameter - sharing strategy** to ensure that parameters can only be shared between homogeneous architectures, thereby improving the training stability.
Through these methods, AGNN can discover GNN architectures that are superior to hand - designed models and other search methods on multiple benchmark datasets.
### Formula Summary
1. **Optimization objective**:
\[
f^*=\arg\max_{f\in F}M(f(\theta^*), D_{\text{valid}})
\]
\[
\theta^*=\arg\min_\theta L(f(\theta), D_{\text{train}})
\]
where \( f^* \) is the optimal GNN architecture, \( \theta^* \) is the optimal parameter of this architecture, \( M \) is an evaluation metric (such as F1 - score or accuracy), and \( L \) is a loss function.
2. **Definition of graph convolutional layer**:
\[
h_i^{(k)}=\text{AGGREGATE}(\{a_{ij}^{(k)}W^{(k)}x_j^{(k - 1)}:j\in N(i)\})
\]
\[
x_i^{(k)}=\text{ACT}(\text{COMBINE}(W^{(k)}x_i^{(k - 1)}, h_i^{(k)}))
\]
where \( h_i^{(k)} \) is the intermediate embedding of node \( i \) at the \( k \) - th layer, \( x_i^{(k)} \) is the final embedding, \( N(i) \) is the set of neighbors of node \( i \), \( W^{(k)} \) is a trainable matrix, \( a_{ij}^{(k)} \) is an attention coefficient, and \(\text{AGGREGATE}\), \(\text{COMBINE}\) and \(\text{ACT}\) are aggregation, combination and activation functions respectively.
3. **Reinforcement learning update rule**:
\[
\nabla_{\theta_c}J(\theta_c)=\sum_{t = 1}^n\mathbb{E}\left[(R_c - b_c)\nabla_{\theta_c}\log P(a_t|a_{t - 1};\theta_c)\right]
\]
where