Research on Fine-Tuning CNN for Cancer Diagnosis with Gene Expression Data

Zhen Liu,Ruoyu Wang,Jin Yang,Wenbin Zhang
DOI: https://doi.org/10.1145/3529836.3529844
2022-01-01
Abstract:Convolutional neural networks have been used for cancer type prediction with gene expression data. However, its success is impeded by the lack of large labeled datasets in gene expression data. The class imbalance problem leads to that the model ignores the performance of the minority class. To handle the small sample size problem, fine-tuning CNN is used to transfer the knowledge of pre-trained model for cancer type predicting. The dataset with one cancer is used for training a model. The pre-model is fine-tuned with the training set of a new cancer type, and the fine-tuned model could be used for identifying the new cancer type. And the SMOTE resampling method is used for handling the class imbalance problem. We carried out experiments on The TCGA datasets with 1D-CNN and 2D-CNN models. The fine-tuned 1D-CNN obtains 97.5% accuracy, 98.6% Fscore of cancer type and 78.1% Fscore of normal type on average, and fine-tuned 2D-CNN obtains 97.4% accuracy, 98.5% Fscore of cancer type and 77.4% of normal type on average. Using fine-tuned CNN with SMOTE, the accuracy, Fscore of cancer type and the one of normal type are respectively increased about 1.5%, 0.5% and 21.5% on average.
What problem does this paper attempt to address?