Identification of Gene Expression in Different Stages of Breast Cancer with Machine Learning

Ali Abidalkareem,Ali K. Ibrahim,Moaed Abd,Oneeb Rehman,Hanqi Zhuang
DOI: https://doi.org/10.3390/cancers16101864
2024-05-15
Cancers
Abstract:Determining the tumor origin in humans is vital in clinical applications of molecular diagnostics. Metastatic cancer is usually a very aggressive disease with limited diagnostic procedures, despite the fact that many protocols have been evaluated for their effectiveness in prognostication. Research has shown that dysregulation in miRNAs (a class of non-coding, regulatory RNAs) is remarkably involved in oncogenic conditions. This research paper aims to develop a machine learning model that processes an array of miRNAs in 1097 metastatic tissue samples from patients who suffered from various stages of breast cancer. The suggested machine learning model is fed with miRNA quantitative read count data taken from The Cancer Genome Atlas Data Repository. Two main feature-selection techniques have been used, mainly Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance, to identify the most discriminant and relevant miRNAs for their up-regulated and down-regulated states. These miRNAs are then validated as biological identifiers for each of the four cancer stages in breast tumors. Both machine learning algorithms yield performance scores that are significantly higher than the traditional fold-change approach, particularly in earlier stages of cancer, with Neighborhood Component Analysis and Minimum Redundancy Maximum Relevance achieving accuracy scores of up to 0.983 and 0.931, respectively, compared to 0.920 for the FC method. This study underscores the potential of advanced feature-selection methods in enhancing the accuracy of cancer stage identification, paving the way for improved diagnostic and therapeutic strategies in oncology.
oncology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use machine - learning methods to identify the gene - expression characteristics in different stages of breast cancer, especially the expression of microRNA (miRNA). Specifically, the research aims to process miRNA data in 1,097 metastatic tissue samples through a machine - learning model to identify miRNAs that are discriminative for the four different stages of breast cancer. These miRNAs are then verified as biological markers for each cancer stage. ### Research Background and Problems - **Status of Breast Cancer**: According to the data of the World Health Organization, 2.1 million women are affected by breast cancer every year. Breast cancer is one of the leading causes of cancer - related deaths among women, especially in developed countries such as the United States and the United Kingdom. - **Complexity of Etiology**: The causes of breast cancer are diverse, including reproductive age, age at menopause, use of contraceptives, hormone therapy, etc. In addition, genetic mutations (such as BRCA1 and BRCA2 genes) are also important factors. - **Role of miRNA**: miRNA is a class of non - coding RNA, which plays an important role in tumorigenesis and development. Abnormal expression of miRNA is considered an important marker of breast cancer. - **Limitations of Existing Diagnostic Methods**: Although there are many diagnostic methods, metastatic breast cancer is usually a very aggressive disease with limited diagnostic means. Therefore, it is particularly important to find more accurate diagnostic methods. ### Research Objectives - **Identify Key miRNAs**: Through machine - learning methods, especially Neighborhood Component Analysis (NCA) and Minimum Redundancy Maximum Relevance (MRMR), identify miRNAs with significantly differentially expressed in different stages of breast cancer. - **Improve Diagnostic Accuracy**: Compare with the traditional fold - change method, evaluate the diagnostic accuracy of NCA and MRMR methods in the early cancer stage. - **Provide New Diagnostic Tools**: By identifying stage - specific miRNA markers, provide new tools and strategies for clinical diagnosis and treatment. ### Methods - **Data Source**: Obtain miRNA quantitative read - out data of 1,097 metastatic tissue samples from The Cancer Genome Atlas (TCGA) database. - **Feature Selection**: Use two feature - selection techniques, NCA and MRMR, to identify the most discriminative miRNAs. - **Classification Algorithm**: Use Support Vector Machine (SVM) as a classifier to divide features into four different cancer stages. - **Performance Evaluation**: Compare with the traditional fold - change method to evaluate the performance of NCA and MRMR methods. ### Results - **High Accuracy**: The diagnostic accuracy of NCA and MRMR methods in the early cancer stage is significantly higher than that of the traditional fold - change method. The accuracy rate of the NCA method reaches 0.983, the accuracy rate of the MRMR method is 0.931, and the accuracy rate of the fold - change method is 0.920. - **Potential Applications**: The research shows that advanced feature - selection methods can significantly improve the accuracy of cancer - stage identification, providing a new way to improve diagnosis and treatment strategies. ### Conclusions This research shows the potential of NCA and MRMR methods in identifying miRNA markers in different stages of breast cancer, providing new tools for clinical diagnosis and treatment. These methods not only improve the accuracy of diagnosis but also lay the foundation for personalized medicine and precision treatment.