Detection of Nasopharyngeal Carcinoma Using Routine Medical Tests Via Machine Learning
Qi Liu,Jinyang Du,Yuge Li,Guiyuan Peng,Yong Zhong,Ruxu Du
DOI: https://doi.org/10.1145/3524086.3524102
2022-01-01
Abstract:Nasopharyngeal carcinoma (NPC) is one of the most common types of cancers in South China and Southeast Asia. Clinical data has shown that early detection is essential for improving treatment effectiveness and survival rate. Unfortunately, because the early symptoms of NPC are rather similar to that of mild diseases such as rhinitis, it is common to miss early detection. Currently, the most common method for detecting NPC is based on Epstein-Barr Virus (EBV) antibodies. However, it is usually not included in the annual routine medical tests. This paper presents a study on the detection of NPC using routine medical tests via machine learning methods, namely Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Machine learning can extract valuable but hidden information from complex medical test data. First, we use a dataset containing 523 NPC patients (first diagnosed before any medical treatment) as well as 600 healthy people as controls. The data consists of five categories of information: demographic features (gender, age), EBV antibodies (VCA-IgA, EA-IgA), blood test indices, liver function test indices, and urine sediment test indices. Our evaluation criteria consist of accuracy, sensitivity, specificity, Youden index, and Area Under the receiver operating characteristic Curve (AUC). The results show that RF outperforms both SVM and ANN. When using only EBV antibody data, the accuracy, sensitivity and specificity are 90.4%, 89.8% and 90.8% respectively, which is comparable to the results in the existing literature. When using only the routine medical test data, the accuracy, sensitivity and specificity are 95.0%, 93.3% and 96.5% respectively. When using both, the accuracy, specificity and sensitivity are 96.9%, 96.9% and 96.8% respectively. Second, we use another dataset containing 100 NPC patients and 100 healthy people for validation (use RF with only the routine medical test data). The prediction accuracy, sensitivity and specificity are 93.1%, 88.1% and 98.2% respectively. This demonstrates that NPC can be effectively detected using routine medical test data via machine learning. The new method will have a number of positive impacts, including ease of implementation, improved detection accuracy as well as reduced testing cost.