Viral Immunogenicity Prediction by Machine Learning Methods

Nikolet Doneva,Ivan Dimitrov
DOI: https://doi.org/10.3390/ijms25052949
IF: 5.6
2024-03-04
International Journal of Molecular Sciences
Abstract:Since viruses are one of the main causes of infectious illnesses, prophylaxis is essential for efficient disease control. Vaccines play a pivotal role in mitigating the transmission of various viral infections and fortifying our defenses against them. The initial step in modern vaccine design and development involves the identification of potential vaccine targets through computational techniques. Here, using datasets of 1588 known viral immunogens and 468 viral non-immunogens, we apply machine learning algorithms to develop models for the prediction of protective immunogens of viral origin. The datasets are split into training and test sets in a 4:1 ratio. The protein structures are encoded by E-descriptors and transformed into uniform vectors by the auto- and cross-covariance methods. The most relevant descriptors are selected by the gain/ratio technique. The models generated by Random Forest, Multilayer Perceptron, and XGBoost algorithms demonstrate superior predictive performance on the test sets, surpassing predictions made by VaxiJen 2.0—an established gold standard in viral immunogenicity prediction. The key attributes determining immunogenicity in viral proteins are specific fingerprints in hydrophobicity and steric properties.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to improve the prediction performance of viral immunogenicity through machine learning methods. Specifically, the researchers used known datasets of viral immunogenicity and non-immunogenicity and applied various machine learning algorithms (such as Random Forest, Multilayer Perceptron, and XGBoost) to develop models for predicting protective viral immunogenicity. These models outperform the existing VaxiJen 2.0 tool on the test set. ### Summary of Background and Objectives - **Background**: - Viruses are one of the main causes of infectious diseases, and preventive measures are crucial for effective disease control. - Vaccines play a key role in preventing the spread of various viral infections and enhancing the body's defenses. - In modern vaccine design, identifying potential vaccine targets through computational techniques is a critical first step. - The widely used VaxiJen 2.0 tool is used to predict protective antigens from different sources, but its dataset is relatively outdated and needs updating and improvement. - **Objectives**: - To improve the prediction performance of viral immunogenicity using updated datasets and advanced machine learning algorithms. - To identify key attributes that determine the immunogenicity of viral proteins. Through the above work, the researchers hope to develop more accurate and reliable prediction models, thereby accelerating the vaccine design and development process.