Screening of genes co-associated with osteoporosis and chronic HBV infection based on bioinformatics analysis and machine learning

Jia Yang,Weiguang Yang,Yue Hu,Linjian Tong,Rui Liu,Lice Liu,Bei Jiang,Zhiming Sun
DOI: https://doi.org/10.3389/fimmu.2024.1472354
2024-09-16
Abstract:Objective: To identify HBV-related genes (HRGs) implicated in osteoporosis (OP) pathogenesis and develop a diagnostic model for early OP detection in chronic HBV infection (CBI) patients. Methods: Five public sequencing datasets were collected from the GEO database. Gene differential expression and LASSO analyses identified genes linked to OP and CBI. Machine learning algorithms (random forests, support vector machines, and gradient boosting machines) further filtered these genes. The best diagnostic model was chosen based on accuracy and Kappa values. A nomogram model based on HRGs was constructed and assessed for reliability. OP patients were divided into two chronic HBV-related clusters using non-negative matrix factorization. Differential gene expression analysis, Gene Ontology, and KEGG enrichment analyses explored the roles of these genes in OP progression, using ssGSEA and GSVA. Differences in immune cell infiltration between clusters and the correlation between HRGs and immune cells were examined using ssGSEA and the Pearson method. Results: Differential gene expression analysis of CBI and combined OP dataset identified 822 and 776 differentially expressed genes, respectively, with 43 genes intersecting. Following LASSO analysis and various machine learning recursive feature elimination algorithms, 16 HRGs were identified. The support vector machine emerged as the best predictive model based on accuracy and Kappa values, with AUC values of 0.92, 0.83, 0.74, and 0.7 for the training set, validation set, GSE7429, and GSE7158, respectively. The nomogram model exhibited AUC values of 0.91, 0.79, and 0.68 in the training set, GSE7429, and GSE7158, respectively. Non-negative matrix factorization divided OP patients into two clusters, revealing statistically significant differences in 11 types of immune cell infiltration between clusters. Finally, intersecting the HRGs obtained from LASSO analysis with the HRGs identified three genes. Conclusion: This study successfully identified HRGs and developed an efficient diagnostic model based on HRGs, demonstrating high accuracy and strong predictive performance across multiple datasets. This research not only offers new insights into the complex relationship between OP and CBI but also establishes a foundation for the development of early diagnostic and personalized treatment strategies for chronic HBV-related OP.
What problem does this paper attempt to address?