GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies
Lin Lin,Rachel L. Spreng,Kelly E. Seaton,S. Moses Dennison,Lindsay C. Dahora,Daniel J. Schuster,Sheetal Sawant,Peter B. Gilbert,Youyi Fong,Neville Kisalu,Andrew J. Pollard,Georgia D. Tomaras,Jia Li
DOI: https://doi.org/10.1371/journal.pcbi.1012581
2024-11-18
PLoS Computational Biology
Abstract:Despite significant progress in vaccine research, the level of protection provided by vaccination can vary significantly across individuals. As a result, understanding immunologic variation across individuals in response to vaccination is important for developing next-generation efficacious vaccines. Accurate outcome prediction and identification of predictive biomarkers would represent a significant step towards this goal. Moreover, in early phase vaccine clinical trials, small datasets are prevalent, raising the need and challenge of building a robust and explainable prediction model that can reveal heterogeneity in small datasets. We propose a new model named Generative Mixture of Logistic Regression (GeM-LR), which combines characteristics of both a generative and a discriminative model. In addition, we propose a set of model selection strategies to enhance the robustness and interpretability of the model. GeM-LR extends a linear classifier to a non-linear classifier without losing interpretability and empowers the notion of predictive clustering for characterizing data heterogeneity in connection with the outcome variable. We demonstrate the strengths and utility of GeM-LR by applying it to data from several studies. GeM-LR achieves better prediction results than other popular methods while providing interpretations at different levels. Vaccines have proven to be a powerful tool in preventing infectious diseases, yet their effectiveness can vary significantly from person to person. This variability underscores the need for a better understanding of how individuals' immune systems respond to vaccination, which is essential for achieving successful immunization across a broader population. In our study, we introduce a new model called the Generative Mixture of Logistic Regression (GeM-LR) to predict vaccine effectiveness in different individuals. This model is particularly beneficial when only small datasets are available, a common challenge in vaccine research, whereas many advanced machine learning methods require large training datasets. GeM-LR integrates mixture modeling and logistic regression to provide more accurate predictions while also offering insights into the factors that contribute to varying vaccine responses among individuals. We demonstrate that our model outperforms other standard methods in terms of accuracy and enhances the understanding of data for future research. Our innovative approach holds great promise for improving vaccine development. By revealing hidden patterns within small datasets, GeM-LR aims to make vaccine research more efficient and impactful.
biochemical research methods,mathematical & computational biology