Abstract:Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the 'adaptive immune system' and 'immune disease' are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.

A machine learning classifier using 33 host immune response mRNAs accurately distinguishes viral and non-viral acute respiratory illnesses in nasal swab samples

Application of Multiplex PCR Coupled with Matrix-Assisted Laser Desorption Ionization–Time of Flight Analysis for Simultaneous Detection of 21 Common Respiratory Viruses

A 2-Gene Host Signature for Improved Accuracy of COVID-19 Diagnosis Agnostic to Viral Variants

A T-Cell-Derived 3-Gene Signature Distinguishes SARS-CoV-2 from Common Respiratory Viruses

Identifying novel host-based diagnostic biomarker panels for COVID-19: a whole-blood/nasopharyngeal transcriptome meta-analysis

The diagnostic value of nasal microbiota and clinical parameters in a multi-parametric prediction model to differentiate bacterial versus viral infections in lower respiratory tract infections

Transcriptional Profiling and Machine Learning Unveil a Concordant Biosignature of Type I Interferon-Inducible Host Response Across Nasal Swab and Pulmonary Tissue for COVID-19 Diagnosis

Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity

Predictive signature of murine and human host response to typical and atypical pneumonia

Analysis of gene expression dynamics and differential expression in viral infections using generalized linear models and quasi-likelihood methods

Detecting respiratory viral RNA using expanded genetic alphabets and self-avoiding DNA

Protein and transcriptional biomarker profiling may inform treatment strategies in lower respiratory tract infections by indicating bacterial-viral differentiation

Accurate Virus Identification with Interpretable Raman Signatures by Machine Learning

A novel metric-based approach of scoring early host immune response from oro-nasopharyngeal swabs predicts COVID-19 outcome

Rapid, Sample-to-Answer Host Gene Expression Test to Diagnose Viral Infection

Metagenomic profiling of nasopharyngeal samples from adults with acute respiratory infection

Unravelling the acute respiratory infection landscape: virus type, viral load, health status and coinfection do matter

Reverse transcription polymerase chain reaction and electrospray ionization mass spectrometry for identifying acute viral upper respiratory tract infections

Analytical and clinical validation of a novel, laboratory-developed, modular multiplex-PCR panel for fully automated high-throughput detection of 16 respiratory viruses

Decoding viral and host microRNA signatures in airway-derived biosamples: Insights for biomarker discovery in viral respiratory infections

Identification of Gene Signatures Associated with COVID-19 across Children, Adolescents, and Adults in the Nasopharynx and Peripheral Blood by Using a Machine Learning Approach