AFFIPred: AlphaFold2 Structure-based Functional Impact Prediction of Missense Variations

Mustafa Samet Pir,Emel Timucin
DOI: https://doi.org/10.1101/2024.05.13.593840
2024-05-15
Abstract:Structural information holds immense potential for pathogenicity prediction of missense variations, albeit structure-based pathogenicity classifiers are limited compared to their sequence-based counterparts due to the well-known gap between sequence and structure data. Leveraging the highly accurate protein structure prediction method, AlphaFold2 (AF2), we introduce AFFIPred, an ensemble machine learning classifier that combines established sequence and AF2-based structural characteristics to predict disease-causing missense variant pathogenicity. Based on the assessments on multiple unseen datasets, AFFIPred had the same level performance with the state-of-the-art predictors such as AlphaMissense and Rhapsody. We also showed that the recruitment of AF2 structures that are full-length and represent the unbound states ensures more precise SASA calculations compared to the recruitment of experimental structures. Second, in line with the the completeness of the AF2 structures, their use provide a more comprehensive view of the structural characteristics of the missense variation datasets by capturing all variants. AFFIPred maintains high-level accuracy without the well-known limitations of structure-based pathogenicity classifiers, paving the way for the development of more sophisticated structure-based methods without PDB dependence. AFFIPred has predicted over 210 million variations of the human proteome, which are accessible at https://affipred.timucinlab.com/ .
Bioinformatics
What problem does this paper attempt to address?