Heterogeneous PPI Network Representation Learning for Protein Complex Identification

Zhou Peixuan,Zhang Yijia,Chen Fei,Pang Kuo,Lu Mingyu
DOI: https://doi.org/10.1007/978-3-031-23198-8_20
2023-01-01
Abstract:Protein complexes are critical units for studying a cell system. How to accurately identify protein complexes has always been the focus of research. Most of the existing methods are based on the topological structure of the Protein-Protein Interaction (PPI) network and introduce some biological information to analyze the correlation between proteins to identify protein complex. However, these methods only comprise a homogenous network of biological information and protein nodes. Most of them ignore that different types of nodes have different importance for protein complex identification. Therefore, there is an urgent need for a method to integrate different types of biological information. This paper proposes a new protein complex identification method GHAE based on heterogeneous network representation learning. Firstly, GHAE combines Gene Ontology (GO) attribute information and PPI data to construct a heterogeneous PPI network. Secondly, based on the constructed network, we use the heterogeneous representation learning method to obtain the vector representation of protein nodes. Finally, we propose a complex identification method based on a heterogeneous network to identify protein complexes. Extensive experiments show that our method achieves state-of-the-art performance in most cases.
What problem does this paper attempt to address?