Construct prognostic models of multiple myeloma with pathway information incorporated

Shuo Wang,ShanJin Wang,Wei Pan,YuYang Yi,Junyan Lu
DOI: https://doi.org/10.1371/journal.pcbi.1012444
2024-09-11
PLoS Computational Biology
Abstract:Multiple myeloma (MM) is a hematological disease exhibiting aberrant clonal expansion of cancerous plasma cells in the bone marrow. The effects of treatments for MM vary between patients, highlighting the importance of developing prognostic models for informed therapeutic decision-making. Most previous models were constructed at the gene level, ignoring the fact that the dysfunction of the pathway is closely associated with disease development and progression. The present study considered two strategies that construct predictive models by taking pathway information into consideration: pathway score method and group lasso using pathway information. The former simply converted gene expression to sample-wise pathway scores for model fitting. We considered three methods for pathway score calculation (ssGSEA, GSVA, and z-scores) and 14 data sources providing pathway information. We implemented these methods in microarray data for MM (GSE136324) and obtained a candidate model with the best prediction performance in interval validation. The candidate model is further compared with the gene-based model and previously published models in two external data. We also investigated the effects of missing values on prediction. The results showed that group lasso incorporating Vax pathway information (Vax(grp)) was more competitive in prediction than the gene model in both internal and external validation. Immune information, including VAX pathways, seemed to be more predictive for MM. Vax(grp) also outperformed the previously published models. Moreover, the new model was more resistant to missing values, and the presence of missing values (<5%) would not evidently deteriorate its prediction accuracy using our missing data imputation method. In a nutshell, pathway-based models (using group lasso) were competitive alternatives to gene-based models for MM. These models were documented in an R package (https://github.com/ShuoStat/MMMs), where a missing data imputation method was also integrated to facilitate future validation. Traditionally, prognostic models were mainly constructed at the gene level, ignoring the role of pathway functions in disease development and progression. Enlightened by this, we advocated guiding the model building with well-established prior knowledge (pathway information). We investigated several approaches that could incorporate pathway information in model building. The results showed that pathway-based models exhibit superior predictive capabilities compared to their gene-based counterparts. Furthermore, these pathway-based models were more robust to missing values. The proposed models also outperformed previously published gene-based models. Beyond their prediction performance, the pathway-based models can directly reveal the association between the pathway functions and survival outcomes, demonstrating their advantage in the model interpretability. This enhanced interpretability not only deepens our understanding of disease mechanisms but also facilitates informed decision-making in clinical and research settings. We urge for more attention to be given to developing modeling methods that incorporate prior knowledge.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?