A Naive Bayes Algorithm for Tissue Origin Diagnosis (Tod-Bayes) of Synchronous Multifocal Tumors in the Hepatobiliary and Pancreatic System

Weiqin Jiang,Yifei Shen,Yongfeng Ding,Chuyu Ye,Yi Zheng,Peng Zhao,Lulu Liu,Zhou Tong,Linfu Zhou,Shuo Sun,Xingchen Zhang,Lisong Teng,Michael P. Timko,Longjiang Fan,Weijia Fang
DOI: https://doi.org/10.1002/ijc.31054
2018-01-01
International Journal of Cancer
Abstract:Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and approximate to 1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment.
What problem does this paper attempt to address?