The Application of Machine Learning in Cervical Cancer Prediction

Qihui Yin
DOI: https://doi.org/10.1145/3468891.3468894
2021-04-23
Abstract:Cervical cancer is malignant cancer that happens to women over the age of 30. Though it may sound dangerous, cervical cancer can be easily prevented through regular screening tests. Unfortunately, screening tests can be costly, inefficient, and subjective due to limited hospital sources and large amounts of patients. In order to resolve the deficiencies of screening tests listed above, we designed a machine-learning algorithm that can deal with big data at once with higher accuracy. It can predict the possibility of someone having cervical cancer based on various variables including age and habits. Data can be collected easily through the surveys which patients fill. In this way, this machine-learning model will be more objective compared to doctors’ diagnoses. To build such a model, we used the cervical cancer (risk factors) data set displayed in the UCI Machine Learning Repository. After the data was obtained, we first conducted descriptive statistical analysis to investigate the distribution of features and relationships between independent variables and the probability of cervical cancer. Then, models including logistic regression, decision tree, random forest, and adaboosting were applied to build a prediction model. Due to the fact that the prevalence rate is unbalanced, we also included a weighted version for each model we used.
What problem does this paper attempt to address?