Navigating Homogeneous Paths through Amyloidogenic and Non-Amyloidogenic Hexapeptides

Laszlo Keresztes,Evelin Szogi,Balint Varga,Viktor Farkas,Andras Perczel,Vince Grolmusz
2023-09-07
Abstract:Hexapeptides are increasingly applied as model systems for studying the amyloidogenecity properties of oligo- and polypeptides. It is possible to construct 64 million different hexapeptides from the twenty proteinogenic amino acid residues. Today's experimental amyloid databases contain only a fraction of these annotated hexapeptides. For labeling all the possible hexapeptides as "amyloidogenic" or "non-amyloidogenic" there exist several computational predictors with good accuracies. It may be of interest to define and study a simple graph structure on the 64 million hexapeptides as nodes when two hexapeptides are connected by an edge if they differ by only a single residue. For example, in this graph, HIKKLM is connected to AIKKLM, or HIKKNM, or HIKKLC, but it is not connected with an edge to VVKKLM or HIKNPM. In the present contribution, we consider our previously published artificial intelligence-based tool, the Budapest Amyloid Predictor (BAP for short), and demonstrate a spectacular property of this predictor in the graph defined above. We show that for any two hexapeptides predicted to be "amyloidogenic" by the BAP predictor, there exists an easily constructible path of length at most 6 that passes through neighboring hexapeptides all predicted to be "amyloidogenic" by BAP. For example, the predicted amyloidogenic ILVWIW and FWLCYL hexapeptides can be connected through the length-6 path ILVWIW-IWVWIW-IWVCIW-IWVCIL-FWVCIL-FWLCIL-FWLCYL in such a way that the neighbors differ in exactly one residue, and all hexapeptides on the path are predicted to be amyloidogenic by BAP. The symmetric statement also holds for non-amyloidogenic hexapeptides. It is noted that the mentioned property of the Budapest Amyloid Predictor \url{<a class="link-external link-https" href="https://pitgroup.org/bap" rel="external noopener nofollow">this https URL</a>} is not proprietary; it is also true for any linear Support Vector Machine (SVM)-based predictors.
Biomolecules,Molecular Networks
What problem does this paper attempt to address?