SMI-BLAST: a Novel Supervised Search Framework Based on PSI-BLAST for Protein Remote Homology Detection.

Xiaopeng Jin,Qing Liao,Hang Wei,Jun Zhang,Bin Liu
DOI: https://doi.org/10.1093/bioinformatics/btaa772
IF: 5.8
2020-01-01
Bioinformatics
Abstract:Motivation: As one of the most important and widely used mainstream iterative search tool for protein sequence search, an accurate Position-Specific Scoring Matrix (PSSM) is the key of PSI-BLAST. However, PSSMs containing non-homologous information obviously reduce the performance of PSI-BLAST for protein remote homology. Results: To further study this problem, we summarize three types of Incorrectly Selected Homology (ISH) errors in PSSMs. A new search tool Supervised-Manner-based Iterative BLAST (SMI-BLAST) is proposed based on PSIBLAST for solving these errors. SMI-BLAST obviously outperforms PSI-BLAST on the Structural Classification of Proteins-extended (SCOPe) dataset. Compared with PSI-BLAST on the ISH error subsets of SCOPe dataset, SMIBLAST detects 1.6-2.87 folds more remote homologous sequences, and outperforms PSI-BLAST by 35.66% in terms of ROC1 scores. Furthermore, this framework is applied to JackHMMER, DELTA-BLAST and PSI-BLASTexB, and their performance is further improved.
What problem does this paper attempt to address?