Comparative Study of Cancer Classification by Analysis of RNA-seq Gene Expression Levels
Rhea Paul,Sarada Jayan,A. A,A. Thakur,Nidhin Prabhakar Tv,Ramanan R
DOI: https://doi.org/10.1109/ICCCNT54827.2022.9984600
2022-10-03
Abstract:Cancer classification is in the spotlight of research in the medical domain; genetic inheritance plays a significant role in causing this condition. Certain similarities in DNA sequence can be prevalent in individuals who have cancer. The foremost intention of this paper is to put forth efficient cancer classification techniques that provide steady and substantial accuracy. An unconventional factor that is RNA-seq gene expression levels are considered here as opposed to familiar factors such as physical features or results of various imagery techniques. The classification is conducted by analyzing RNA-seq Gene Expression levels measured by a powerful sequencing system called the Illumina Hiseq platform. A comparative study of the application of various machine learning algorithms - Decision Tree, Random Forest, SVM, KNN, Naïve Bayes, Multinomial Regression is conducted in this paper. This method significantly reduces the computational complexity as against deep neural network approach which is used with various imagery techniques. Experimental observations suggest that SVM provides higher accuracy, cross-validation score, nearly ideal AUC-ROC curves, and better performance concerning the time required to fit the model and subsequently predict the cancer type.
Medicine,Biology,Computer Science