Succinct Amyloid and Non-Amyloid Patterns in Hexapeptides

Laszlo Keresztes,Evelin Szogi,Balint Varga,Viktor Farkas,Andras Perczel,Vince Grolmusz
DOI: https://doi.org/10.48550/arXiv.2202.14031
2022-03-01
Abstract:Hexapeptides are widely applied as a model system for studying amyloid-forming properties of polypeptides, including proteins. Recently, large experimental databases have become publicly available with amyloidogenic labels. Using these datasets for training and testing purposes, one may build artificial intelligence (AI)-based classifiers for predicting the amyloid state of peptides. In our previous work (Biomolecules, 11(4) 500, (2021)) we described the Support Vector Machine (SVM)-based Budapest Amyloid Predictor (\url{<a class="link-external link-https" href="https://pitgroup.org/bap" rel="external noopener nofollow">this https URL</a>}). Here we apply the Budapest Amyloid Predictor for discovering numerous amyloidogenic and non-amyloidogenic hexapeptide patterns with accuracy between 80\% and 84\%, as surprising and succinct novel rules for further understanding the amyloid state of peptides. For example, we have shown that for any independently mutated residue (position marked by ``x''), the patterns CxFLWx, FxFLFx, or xxIVIV are predicted to be amyloidogenic, while those of PxDxxx, xxKxEx, and xxPQxx non-amyloidogenic at all. We note that each amyloidogenic pattern with two x's (e.g.,CxFLWx) describes succinctly $20^2=400$ hexapeptides, while the non-amyloidogenic patterns comprising four point mutations (e.g.,PxDxxx) gives $20^4=160,000$ hexapeptides in total. To our knowledge, no similar applications of artificial intelligence tools or succinct amyloid patterns were described before the present work.
Biomolecules
What problem does this paper attempt to address?