Machine learning-based gene expression biomarkers to distinguish Zika and Dengue virus infections: implications for diagnosis

Ayesha Zeba,Aruna Rajalingam,Kanagaraj Sekar,Anjali Ganjiwale
DOI: https://doi.org/10.1007/s13337-024-00885-8
VirusDisease
Abstract:Zika virus (ZIKV) and Dengue virus (DENV) infections cause severe disease in humans and are significant socio-economic burden worldwide. These flavivirus infections are difficult to diagnose serologically due to antigenic overlap. The phylogenetic analysis shows that ZIKV clusters with DENVs at a higher node of the phylogenetic tree with significant genomic and structural similarity. Our study aims to identify gene biomarkers for the classification of Dengue and Zika viral infections using machine learning algorithms and bioinformatics analysis. The gene expression count matrix for single-cell RNA sequencing dataset GSE110496 was analyzed using binary classifiers, namely Logistic regression, Support Vector Machines, Random Forest, and Decision trees. The GSE110496 dataset represents a unique study of the transcriptional and translational dynamics of DENV and ZIKV infections at 4-, 12-, 24-, and 48-h time points for human hepatoma (Huh7) cells. Out of which 24-h time point has been analyzed in this study, at the optimal threshold of viral molecules. Feature selection was performed using two different approaches Random Forest Classifier (RFC) for gene ranking and Recursive Feature Elimination (RFE). Out of which RFE, showed more accuracy and precision. The classification accuracy of 89.4% and the precision of 90% were obtained using selected 10 gene features. SCY1 Like Pseudokinase 3 (SCYL3), Chromosome 1 Open Reading Frame 112 (C1orf112), Complement factor H (CFH), Heme-binding protein 1 (HEBP1), Cadherin 1 (CDH1), Nibrin (NBN), Histone deacetylase 5 (HDAC5), nuclear receptor subfamily 0, group B, member 2 (NR0B2), Annexin A9 (ANXA9) and Alcohol dehydrogenase 6 (ADH6) are the proposed gene biomarkers in this study. The functional analysis of the reported biomarkers was performed using KEGG and GO with the WEB-based Gene SeT AnaLysis Toolkit (WebGestalt). The relationship of the selected biomarkers with DENV and ZIKV infections analyzed using a gene-gene interaction network showed important interactions for viral entry, replication, translation, and metabolic pathways. These biomarkers are potential diagnostic markers for DENV and ZIKV infections based on machine learning analysis and need further experimental validation. Supplementary information: The online version contains supplementary material available at 10.1007/s13337-024-00885-8.
What problem does this paper attempt to address?