Efficient and effective training of language and graph neural network models

Vassilis N. Ioannidis,Xiang Song,Da Zheng,Houyu Zhang,Jun Ma,Yi Xu,Belinda Zeng,Trishul Chilimbi,George Karypis
DOI: https://doi.org/10.48550/arXiv.2206.10781
2022-06-22
Abstract:Can we combine heterogenous graph structure with text to learn high-quality semantic and behavioural representations? Graph neural networks (GNN)s encode numerical node attributes and graph structure to achieve impressive performance in a variety of supervised learning tasks. Current GNN approaches are challenged by textual features, which typically need to be encoded to a numerical vector before provided to the GNN that may incur some information loss. In this paper, we put forth an efficient and effective framework termed language model GNN (LM-GNN) to jointly train large-scale language models and graph neural networks. The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model. Several system and design optimizations are proposed to enable scalable and efficient training. LM-GNN accommodates node and edge classification as well as link prediction tasks. We evaluate the LM-GNN framework in different datasets performance and showcase the effectiveness of the proposed approach. LM-GNN provides competitive results in an Amazon query-purchase-product application.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to combine heterogeneous graph structures and text information to learn high - quality semantic and behavioral representations. Specifically, existing graph neural networks (GNNs) face challenges when dealing with text features, because text usually needs to be encoded into numerical vectors first, which may lead to information loss. To this end, the authors propose an efficient and effective framework - Language Model Graph Neural Network (LM - GNN) for jointly training large - scale language models and graph neural networks. ### Main Problems 1. **Efficient Encoding of Text Features**: - When existing GNN methods deal with text features, they usually need to encode the text into numerical vectors, which may result in information loss. - The paper proposes a phased fine - tuning framework (LM - GNN). By first pre - training the BERT model with graph structure and then jointly fine - tuning it with the GNN model, this problem can be overcome. 2. **Efficient Training of Large - Scale Graph Data**: - Training of large - scale graph data involves challenges in efficiency and effectiveness, especially when jointly training large - scale language models and graph neural networks. - The paper proposes a series of system and design optimizations to achieve scalable and efficient training, including techniques such as caching BERT embeddings, joint negative sampling, and distributed GNN training. ### Solutions 1. **Phased Fine - Tuning Framework**: - **Semantic Encoder**: Use the pre - trained BERT model as a semantic encoder to encode the text information of nodes into embedding vectors. - **Graph Encoder**: Use the modified RGCN (Relational Graph Convolutional Network) as a graph encoder to capture graph structure information. - **Supervision Method**: Supervise the training of the model through structural prediction tasks (such as link prediction) and node prediction tasks (such as node classification). 2. **System and Design Optimizations**: - **Caching BERT Embeddings**: Cache the text embeddings of some nodes during the training process to reduce computational overhead. - **Joint Negative Sampling**: By sharing negative sample nodes, reduce the number of nodes in each mini - batch and accelerate training. - **Distributed GNN Training**: Utilize the distributed GNN training framework to achieve efficient training of large - scale graph data. ### Experimental Results - **Node Classification Task**: On public datasets (such as arXiv, product datasets, Yelp), the LM - GNN framework performs excellently in the node classification task, especially when using the graph - aware BERT model and the phased fine - tuning strategy. - **Link Prediction Task**: In the link prediction task, the LM - GNN framework also achieves very good performance, especially after pre - training with the graph - aware BERT model. ### Conclusion Through the proposed LM - GNN framework, the paper effectively solves the challenges of combining heterogeneous graph structures and text information and improves the performance of graph neural networks on multiple tasks. In particular, the phased fine - tuning strategy and a series of system optimization techniques enable this framework to achieve efficient and effective training on large - scale graph data.