Ensemble Learning for Higher Diagnostic Precision in Schizophrenia Using Peripheral Blood Gene Expression Profile

DOI: https://doi.org/10.2147/ndt.s449135
IF: 2.989
2024-05-03
Neuropsychiatric Disease and Treatment
Abstract:Vipul Vilas Wagh, 1 Tanvi Kottat, 1 Suchita Agrawal, 2 Shruti Purohit, 2 Tejaswini Arun Pachpor, 3, 4 Leelavati Narlikar, 5 Vasudeo Paralikar, 2 Satyajeet Pramod Khare 1 1 Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, MH, India; 2 Psychiatry Unit, KEM Hospital Research Centre, Pune, MH, India; 3 Department of Biosciences and Technology, School of Science and Environment Studies, Dr. Vishwanath Karad MIT World Peace University, Pune, MH, India; 4 Department of Biotechnology, MES Abasaheb Garware College, Pune, MH, India; 5 Department of Data Science, Indian Institute of Science Education and Research, Pune, MH, India Correspondence: Vasudeo Paralikar; Satyajeet Pramod Khare, Email ; Introduction: Stigma contributes to a significant part of the burden of schizophrenia (SCZ), therefore reducing false positives from the diagnosis would be liberating for the individuals with SCZ and desirable for the clinicians. The stigmatization associated with schizophrenia advocates the need for high-precision diagnosis. In this study, we present an ensemble learning-based approach for high-precision diagnosis of SCZ using peripheral blood gene expression profiles. Methodology: The machine learning (ML) models, support vector machines (SVM), and prediction analysis for microarrays (PAM) were developed using differentially expressed genes (DEGs) as features. The SCZ samples were classified based on a voting ensemble classifier of SVM and PAM. Further, microarray-based learning was used to classify RNA sequencing (RNA-Seq) samples from our case-control study (Pune-SCZ) to assess cross-platform compatibility. Results: Ensemble learning using ML models resulted in a significantly higher precision of 80.41% (SD: 0.04) when compared to the individual models (SVM-radial: 71.69%, SD: 0.04 and PAM 77.20%, SD: 0.02). The RNA sequencing samples from our case-control study (Pune-SCZ) resulted in a moderate precision (59.92%, SD: 0.05). The feature genes used for model building were enriched for biological processes such as response to stress, regulation of the immune system, and metabolism of organic nitrogen compounds. The network analysis identified RBX1, CUL4B, DDB1, PRPF19 , and COPS4 as hub genes. Conclusion: In summary, this study developed robust models for higher diagnostic precision in psychiatric disorders. Future efforts will be directed towards multi-omic integration and developing "explainable" diagnostic models. Keywords: Schizophrenia, peripheral blood, gene expression, machine learning, ensemble learning Graphical Schizophrenia (SCZ) is a complex neuropsychiatric disorder characterized by a disruption in thinking and sense of self. The death rate is two times higher in schizophrenia-affected individuals, with cardiovascular diseases and suicide as the leading causes of death. 1–3 The global burden of disease 2019 estimates that almost 24 million people are affected by SCZ globally, which indicates its universal presence irrespective of cultural differences worldwide. 4 Significant issues in treating psychiatric disorders are delayed diagnosis and limited certainty of the diagnosis itself. The current diagnostic procedure for SCZ is based on psychiatric evaluation, making it clinician-dependent. The Diagnostic and Statistical Manual of Mental Disorders (DSM-5) based diagnosis for SCZ requires symptoms to be persistent for six months or more. 5 The delay in the treatment accounts for a higher number of years lived with disability associated with SCZ. 6 Recent studies suggest that treatment outcomes can be improved if the time elapsed before the treatment is reduced. 7 Thus, having a sensitive and specific blood test can strengthen and hasten the current diagnostic process for SCZ. Cellular alterations, such as gene expression changes associated with the disorder, have been proposed to be useful as potential biomarkers. A previous study provides substantial evidence for using peripheral blood gene expression profiles for biomarker discovery. 8 The recent use of machine learning (ML) tools has accelerated the biomarker discovery process for psychiatric disorders. 9–11 ML models employ statistical methods to learn from the data to achieve specific objectives. Support vector machines (SVM) and nearest shrunken centroids (NSC) are popular examples of supervised learning ML algorithms used in genomics, particularly in transcriptomics. 12 The ML tools have already provided gene expression markers with higher diagnostic potential. 9,10,13 -Abstract Truncated-
psychiatry,clinical neurology
What problem does this paper attempt to address?