Prediction of Disease-Associated Nssnps by Integrating Multi-Scale ResNet Models with Deep Feature Fusion.
Fang Ge,Ying Zhang,Jian Xu,Arif Muhammad,Jiangning Song,Dong-Jun Yu
DOI: https://doi.org/10.1093/bib/bbab530
IF: 9.5
2022-01-01
Briefings in Bioinformatics
Abstract:More than 6000 human diseases have been recorded to be caused by non-synonymous single nucleotide polymorphisms (nsSNPs). Rapid and accurate prediction of pathogenic nsSNPs can improve our understanding of the principle and design of new drugs, which remains an unresolved challenge. In the present work, a new computational approach, termed MSRes-MutP, is proposed based on ResNet blocks with multi-scale kernel size to predict disease-associated nsSNPs. By feeding the serial concatenation of the extracted four types of features, the performance of MSRes-MutP does not obviously improve. To address this, a second model FFMSRes-MutP is developed, which utilizes deep feature fusion strategy and multi-scale 2D-ResNet and 1D-ResNet blocks to extract relevant two-dimensional features and physicochemical properties. FFMSRes-MutP with the concatenated features achieves a better performance than that with individual features. The performance of FFMSRes-MutP is benchmarked on five different datasets. It achieves the Matthew’s correlation coefficient (MCC) of 0.593 and 0.618 on the PredictSNP and MMP datasets, which are 0.101 and 0.210 higher than that of the existing best method PredictSNP1. When tested on the HumDiv and HumVar datasets, it achieves MCC of 0.9605 and 0.9507, and area under curve (AUC) of 0.9796 and 0.9748, which are 0.1747 and 0.2669, 0.0853 and 0.1335, respectively, higher than the existing best methods PolyPhen-2 and FATHMM (weighted). In addition, on blind test using a third-party dataset, FFMSRes-MutP performs as the second-best predictor (with MCC and AUC of 0.5215 and 0.7633, respectively), when compared with the other four predictors. Extensive benchmarking experiments demonstrate that FFMSRes-MutP achieves effective feature fusion and can be explored as a useful approach for predicting disease-associated nsSNPs. The webserver is freely available at http://csbio.njust.edu.cn/bioinf/ffmsresmutp/ for academic use.