Protein Remote Homology Detection Based On Binary Profiles

Qiwen Dong,Lei Lin,Xiaolong Wang
DOI: https://doi.org/10.1007/978-3-540-71233-6_17
2007-01-01
Abstract:Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. Such binary profiles make up of a new building block for protein sequences. The protein sequences are mapped into high-dimensional vectors by the occurrence times of each binary profile. The resulting vectors are then evaluated by support vector machine to train classifiers that are then used to classify the test protein sequences. The method is further improved by applying an efficient feature extraction algorithm from natural language processing, namely, the latent semantic analysis model. Testing on the SCOP 1.53 database shows that the method based on binary profiles outperforms those based on many other basic building blocks including N-grams, patters and motifs. The ROC50 score is 0.698, which is higher than other methods by nearly 10 percent.
What problem does this paper attempt to address?