Abstract:Abstract Background Advances in unbiased metagenomic next generation sequencing (mNGS) technologies have enabled the study of microbial and host genetic material (DNA and RNA) in one test. In this study, we aimed to develop machine learning-based differential diagnostic models (MLBDDMs) using the metagenomic and human transcriptomic data generated by an affordable bronchoalveolar lavage fluid (BLAF) mNGS assay and investigated their clinical utility for early differential diagnosis of lung cancer and pulmonary infection in patients with pulmonary diseases. Methods We recruited 775 patients with respiratory disease, including 160 pathologically diagnosed lung cancer and clinically diagnosed 615 infectious causes (131 tuberculosis, 172 fungal pneumonia and 312 bacterial pneumonia). An affordable mNGS assay on BALF samples collected from these patients on admission were performed. Using the generated mNGS data, we compared the differences in microbial diversity and host gene expression between lung cancer patients and pulmonary infection patients. The BLAF mNGS datasets of lung cancer group and each infection group were then randomly divided into a training dataset and a validation dataset at a ratio of approximately 3:1 for developing optimal MLBDDMs that can be used to distinguish lung cancer from various pulmonary infections. Results By comparing the BALF mNGS data of lung cancer (n = 160) and pulmonary infection (n = 615), we found that the infection group had higher microbial diversity than lung cancer group (P-value < 0.05). Respiratory colonizing microorganisms (e.g., Corynebacterium propinquum and Bacteroides uniformis) and pathogen (Mycobacterium tuberculosis and Cryptococcus neoformans) were found as differential microbes (adjusted p-value < 0.05, LDA score > 2). From BALF gene expression data, we detected 175 genes enriched in NOD-like receptor signaling pathway and chemokine signaling pathway differentially expressed between lung cancer and pulmonary infection groups (False Discovery Rate, FDR < 0.05). Cell composition analysis revealed that macrophage M1 was higher in lung infection group (P-value < 0.001), whereas mast cell activated and DCs activated were higher in lung cancer group (P-value < 0.001, P-value = 0.016). We integrated the metagenomic (microbial composition and human copy number variation) and transcriptomic data (host differentially expressed genes and cell composition) generated by the BALF mNGS assay with eleven machine learning classifiers to establish diagnosis models for distinguishing lung cancer from pulmonary infection (we named LC/PI model). The results showed that a Random Forest diagnostic model (the RF-LC/PI model) had optimal performance, with a sensitivity and specificity of 86.7% and 87.8%, respectively, in distinguishing lung cancer from pulmonary infection (area under the receiver operating characteristic curve [AUC] = 0.838 in the training dataset; AUC = 0.79 in a held-out validation dataset). Similar to the establishment of the LC/PI model, we further developed three diagnostic models for distinguishing lung cancer and tuberculosis (LC/TB model), lung cancer and fungal pneumonia (LC/FP model), and lung cancer and bacterial pneumonia (LC/BP model), respectively. The AUC of these three models were 0.91, 0.88, 0.91, respectively, showing a high differential diagnosis accuracy. Conclusions We have established MLBDDMs using BALF metagenomic and metatranscriptomic data and achieved superior accuracy for differentiating lung cancer and pulmonary infections, which could promote early diagnosis of pulmonary diseases and benefit more patients with one test.

Integrating a host transcriptomic biomarker with a large language model for diagnosis of lower respiratory tract infection

Integrating respiratory microbiome and host immune response through machine learning for respiratory tract infection diagnosis

Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults

Pulmonary FABP4 is an inverse biomarker of pneumonia in critically ill children and adults

Clinical Utility of In-house Metagenomic Next-generation Sequencing for the Diagnosis of Lower Respiratory Tract Infections and Analysis of the Host Immune Response.

Identification of pediatric respiratory diseases using a fine-grained diagnosis system

A-242 Integrating Respiratory Metagenomics and Metatranscriptomics for Diagnosis of Lung Cancer and Infection in Patients with Pulmonary Diseases

163. Can Machine Learning Guide Antibiotic Initiation for Lower Respiratory Tract Infections?

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

The diagnostic value of nasal microbiota and clinical parameters in a multi-parametric prediction model to differentiate bacterial versus viral infections in lower respiratory tract infections

Deciphering the microbial landscape of lower respiratory tract infections: insights from metagenomics and machine learning

Deep longitudinal lower respiratory tract microbiome profiling reveals genome-resolved functional and evolutionary dynamics in critical illness

A machine learning classifier using 33 host immune response mRNAs accurately distinguishes viral and non-viral acute respiratory illnesses in nasal swab samples

Improving Respiratory Infection Diagnosis with Deep Learning and Combinatorial Fusion: A Two-Stage Approach Using Chest X-ray Imaging

A multimodal integration pipeline for accurate diagnosis, pathogen identification, and prognosis prediction of pulmonary infections

An interpretable diagnostic approach for lung cancer: Combining maximal clique and improved BERT

Leveraging UMLS-driven NLP to enhance identification of influenza predictors derived from electronic medical record data

Proteomic profiling of the local and systemic immune response to pediatric respiratory viral infections

Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on Chinese clinical data

Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models

Remote Diagnosis on Upper Respiratory Tract Infections Based on a Neural Network with Few Symptom Words—A Feasibility Study