Development of fecal microbial diagnostic marker sets of colorectal cancer using natural language processing method

Houcong Liu,Changpu Song,Jidong Wang,Zhufang Chen,Xiaohong Zhang,Hekai Zhou,Linhong Yao,Dan Chen,Wenhao Gu,Rui-Kun Huang,Bing-Kun Huang,Bo-Wei Han,Jihui Du
DOI: https://doi.org/10.1177/03936155231210881
2023-12-21
The International Journal of Biological Markers
Abstract:Background Cancer screening and early detection greatly increase the chances of successful treatment. However, most cancer types lack effective early screening biomarkers. In recent years, natural language processing (NLP)-based text-mining methods have proven effective in searching the scientific literature and identifying promising associations between potential biomarkers and disease, but unfortunately few are widely used. Methods In this study, we used an NLP-enabled text-mining system, MarkerGenie, to identify potential stool bacterial markers for early detection and screening of colorectal cancer. After filtering markers based on text-mining results, we validated bacterial markers using multiplex digital droplet polymerase chain reaction (ddPCR). Classifiers were built based on ddPCR results, and sensitivity, specificity, and area under the curve (AUC) were used to evaluate the performance. Results A total of 7 of the 14 bacterial markers showed significantly increased abundance in the stools of colorectal cancer patients. A five-bacteria classifier for colorectal cancer diagnosis was built, and achieved an AUC of 0.852, with a sensitivity of 0.692 and specificity of 0.935. When combined with the fecal immunochemical test (FIT), our classifier achieved an AUC of 0.959 and increased the sensitivity of FIT (0.929 vs. 0.872) at a specificity of 0.900. Conclusions Our study provides a valuable case example of the use of NLP-based marker mining for biomarker identification.
oncology,biotechnology & applied microbiology
What problem does this paper attempt to address?