A Novel Tool for the Accurate and Affordable Early Diagnosis of Pancreatic Cancer via Machine Learning and Bioinformatics

Siya Goel,Clark Gedney,Jean Honorio
DOI: https://doi.org/10.48550/arXiv.2012.06990
2020-12-13
Abstract:Pancreatic cancer (PC) is the fourth leading cause of cancer death in the United States due to its five-year survival rate of 10%. Late diagnosis, affiliated with the asymptomatic nature in early stages and the location of the cancer with respect to the pancreas, makes current widely-accepted screening methods unavailable. Prior studies have achieved low (70-75%) diagnostic accuracy, possibly because 80% of PC cases are associated with diabetes, leading to misdiagnosis. To address the problems of frequent late diagnosis and misdiagnosis, we developed an accessible, accurate and affordable diagnostic tool for PC, by analyzing the expression of nineteen genes in PC and diabetes. First, machine learning algorithms were trained on four groups of subjects, depending on the occurrence of PC and Diabetes. The models were analyzed with 400 PC subjects at varying stages to ensure validity. Naive Bayes, Neural Network and K-Nearest Neighbors models achieved the highest testing accuracy of around 92.6%. Second, the biological implication of the nineteen genes was investigated using bioinformatics tools. It was found that these genes were significantly involved in regulating the cytoplasm, cytoskeleton and nuclear receptor activity in the pancreas, specifically in acinar and ductal cells. Our novel tool is the first in the literature that achieves a PC diagnostic accuracy of above 90%, having the potential to significantly improve the detection of PC in the background of diabetes and increase the five-year survival rate.
Quantitative Methods
What problem does this paper attempt to address?