Prediction of Aggregation Prone Regions in Proteins Using Deep Neural Networks and Their Suppression by Computational Design
Vojtech Cima,Antonin Kunka,Ekaterina Grakova,Joan Planas-Iglesias,Martin Havlasek,Madhumalar Subramanian,Michal Beloch,Martin Marek,Katerina Slaninova,Jiri Damborsky,Zbynek Prokop,David Bednar,Jan Martinovic
DOI: https://doi.org/10.1101/2024.03.06.583680
2024-03-11
Abstract:Protein aggregation is a hallmark of multiple neurodegenerative diseases and a great hindrance in recombinant protein production, handling, and storage. Identification of aggregation prone residues or regions (APRs) in proteins and their suppression by mutations is a powerful and straightforward strategy for improving protein solubility and yield, which significantly increases their application potential. Towards this, we developed a deep neural network based predictor that generates residue level aggregation profile for one or several input protein sequences. The model was trained on a set of hexapeptides with experimentally characterised aggregation propensities and validated on two independent sets of data including hexapeptides and full-length proteins with annotated APRs. In both cases, the model matched, or outperformed the state-of-the-art algorithms. Its performance was further verified using a set of 34 hexapeptides identified in model haloalkane dehalogenase LinB and seven proteins from AmyPro database. The experimental data from Thioflavin T fluorescence and transmission electron microscopy matched the predictions in 79% of the cases, and revealed inaccuracies in the database annotations. Finally, the utility of the algorithm was demonstrated by identifying APRs in a model enzyme (LinB) and designing aggregation-suppressing mutations in the exposed regions. The designed variants showed reduced aggregation propensity, increased solubility and improved yield, with up to a 100% enhancement compared to the wild type for the best one.
Bioinformatics