Improving Cancer Hallmark Classification with BERT-based Deep Learning Approach

Sultan Zavrak,Seyhmus Yilmaz

2023-06-07

Abstract:This paper presents a novel approach to accurately classify the hallmarks of cancer, which is a crucial task in cancer research. Our proposed method utilizes the Bidirectional Encoder Representations from Transformers (BERT) architecture, which has shown exceptional performance in various downstream applications. By applying transfer learning, we fine-tuned the pre-trained BERT model on a small corpus of biomedical text documents related to cancer. The outcomes of our experimental investigations demonstrate that our approach attains a noteworthy accuracy of 94.45%, surpassing almost all prior findings with a substantial increase of at least 8.04% as reported in the literature. These findings highlight the effectiveness of our proposed model in accurately classifying and comprehending text documents for cancer research, thus contributing significantly to the field. As cancer remains one of the top ten leading causes of death globally, our approach holds great promise in advancing cancer research and improving patient outcomes.

Computation and Language

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to improve the accuracy of cancer hallmarks classification by using the deep - learning method based on BERT (Bidirectional Encoder Representations from Transformers). Specifically, the author attempts to solve the following key problems: 1. **Challenges in cancer hallmarks classification**: - Cancer hallmarks refer to the biological characteristics and behavioral differences between cancer cells and normal cells. These hallmarks are crucial for cancer research and treatment. However, automatic identification and classification of these hallmarks in biomedical literature has always been a challenging problem. - Traditional methods usually rely on manual feature engineering, which is not only time - consuming but also difficult to ensure effectiveness. 2. **The problem of insufficient data**: - In the case of a limited amount of training text data, how to effectively use pre - trained models for transfer learning to improve classification performance. 3. **Limitations of existing methods**: - Existing cancer hallmarks classification methods, such as Convolutional Neural Networks (CNN), Long - Short - Term Memory Networks (LSTM), etc., have certain limitations when dealing with complex text data, especially in multi - label classification tasks. - Traditional machine - learning methods (such as SVM, Naive Bayes, etc.) require a large number of manually - designed features, resulting in high computational costs and unsatisfactory results. 4. **Improving classification accuracy**: - By introducing BERT, a powerful pre - trained language model, and combining transfer - learning techniques, the author hopes to achieve a significant performance improvement in the cancer hallmarks classification task. The experimental results show that the proposed model is superior to existing methods in multiple evaluation indicators, especially reaching an accuracy of 94.45%, which is at least 8.04% higher than existing methods. ### Summary The core objective of the paper is to use advanced natural - language - processing (NLP) techniques, especially the deep - learning method based on BERT, to improve the automatic classification of cancer hallmarks. Through this method, the author hopes to provide a faster and more accurate text - classification tool for cancer research, thereby promoting the progress of cancer research and improving the treatment outcomes of patients.

Improving Cancer Hallmark Classification with BERT-based Deep Learning Approach

Application of BERT to Enable Gene Classification Based on Clinical Evidence.

A Multilabel Text Classifier of Cancer Literature at the Publication Level: Methods Study of Medical Text Classification

DECAB-LSTM: Deep Contextualized Attentional Bidirectional LSTM for cancer hallmark classification

Cancer hallmark analysis using semantic classification with enhanced topic modelling on biomedical literature

Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text

Extracting comprehensive clinical information for breast cancer using deep learning methods

Automatic semantic classification of scientific literature according to the hallmarks of cancer

Improved Breast Cancer Classification through Combining Transfer Learning and Attention Mechanism

Deep-GenMut: Automated genetic mutation classification in oncology: A deep learning comparative study

Medical-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data

CancerBERT: a BERT model for Extracting Breast Cancer Phenotypes from Electronic Health Records

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes

Optimizing the Performance of Breast Cancer Classification by Employing the Same Domain Transfer Learning from Hybrid Deep Convolutional Neural Network Model

Revolutionizing Breast Cancer Diagnosis: A Concatenated Precision through Transfer Learning in Histopathological Data Analysis

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports

Augmented histopathology: Enhancing colon cancer detection through deep learning and ensemble techniques

Improving Precancerous Case Characterization via Transformer-based Ensemble Learning

A Multi-Label Text Classifier at Publication Level Based on "PubMedBERT + TextRNN" for Cancer Literature