Transfer Learning Models for Bacterial Strain Dissemination Biomarkers using Weighted Non-Parallel Proximal Support Vector Machines

Ugochukwu O. Ugwu,Richard A. Slayden,Michael Kirby
DOI: https://doi.org/10.1101/2024.10.11.617744
2024-10-14
Abstract:This paper develops optimization and Machine Learning (ML) algorithms to analyze gene expression datasets from the lungs and spleen of mice, infected intranasally, with two bacterial strains, Francisella tularensis - Schu4 and Live Vaccine Strain (LVS). We propose and utilize Weighted l-norm Generalized Eigenvalue-type Problems (l1-WGEPs) to determine a small set of host biomarkers that report Schu4 and LVS infection of the lungs and dissemination to the spleen. The optimal solutions of l1-WGEPs determine the direction onto which the datasets are projected for dimensionality reduction, with the projection scores computed and ranked for gene selection. The top k-ranked projection scores correspond to the top k most informative biomarker features. The top k features selected from the lungs data are employed to train ML models, with uninfected controls and Schu4 or LVS samples as classes. The trained models are validated on the spleen data to incorporate transfer learning. Baseline ML algorithms such as ANN, XGBoost, AdaBoost, AdaGrad, KNN, SVM, Naive Bayes, Random Forest, Logistic Regression, and Decision Tree are compared with our Weighted l1-norm Non-Parallel Proximal Support Vector Machine (l1-WNPSVM) that is based on two non-parallel separating hyperplanes. We report average balanced accuracy scores of the methods over multiple folds. Gene ontology is performed on the most significant genes in both tissues to reveal biomarkers of disease and examine for relevant metabolic pathways for host-directed therapeutics development and treatment performance.
Bioinformatics
What problem does this paper attempt to address?