Contrastive Self-Supervised Representation Learning for Protein Complexes Identification.
Peixuan Zhou,Yijia Zhang,Fei Chen,Mingyu Lu,Wen Qu,Hongfei Lin,Xiaoxia Liu
DOI: https://doi.org/10.1109/BIBM55620.2022.9995094
2022-01-01
Abstract:The identification of protein complexes can help understand cellular organization principles and the mechanism of biological evolution. In recent years, researchers have proposed numerous computational methods to identify protein complexes through their interaction networks. Most of these methods identify protein complexes based on the topological structure of the PPI network. However, the topological structure contained in the PPI network is very complicated, and the applicability of advanced representation learning methods has not been researched in depth. This paper proposes a contrastive self-supervised representation learning method to identify protein complexes. Our method uses a mix-hop aggregator based on graph neural network (GNN) to capture high-order interaction in the PPI network and leverage a contrastive self-supervised method to train our model without introducing protein labels. Then, we get the vector representation for each protein and construct a weighted PPI network based on the vector representation similarity. Finally, we apply clustering aggregation to identify protein complexes based on a weighted PPI network. In order to access our method, different PPI networks, DIP, Kroganl4k and Biogrid, are used as datasets. By comparing the competing methods including COACH, CMC, MCODE, ClusterONE, GANE and COAN, experimental results show that our method outperforms classic and state-of-the-art methods.