Efficient and accurate prediction of transmembrane topology from amino acid sequence only

Qing Wang,Chongming Ni,Zhen Li,Xiufeng Li,Renmin Han,Feng Zhao,Jinbo Xu,Xin Gao,Sheng Wang
DOI: https://doi.org/10.1101/627307
2019-01-01
bioRxiv
Abstract:Motivation. Fast and accurate identification of transmembrane (TM) topology is well suited for the annotation of whole membrane proteome, and in turn the initial step to predict the structure and function of membrane proteins. However, till now the methods that utilize only amino acid sequence (pureseq) will suffer from low prediction accuracy, whereas the methods that exploit sequence profile or consensus will need too much computing time.Method. This article employs a deep learning framework DeepCNF that predicts TM topology from amino acid sequence only. Compared to previous pureseq approaches that based on Hidden Markov Models (HMM) or Dynamic Bayesian Network (DBN), DeepCNF can accommodate a lot more context information by a hierarchical deep neural network, and simultaneously model the interdependency between adjacent topology labels.Result. Experimental results show that our TM prediction method PureseqTM not only outperforms existing pureseq methods, but also reaches or even surpasses the profile/consensus methods. On the 39 newly released membrane proteins, our approach successfully identifies the correct TM segments and boundaries for at least 3 cases while either of the other approaches failed to do so. When applied to the entire Human proteome, our method can identify the incorrect annotations of TM regions by UniProt, as well as discover the membrane-related proteins that are not manually curated as membrane protein.Availability. http://pureseqtm.predmp.com/
What problem does this paper attempt to address?