Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images

Ajinkya Deshpande,Deep Gupta,Ankit Bhurane,Nisha Meshram,Sneha Singh,Petia Radeva
2024-12-04
Abstract:Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to extract the features from pre-trained convolutional neural network (CNN) models and a classifier made up of a sequence of fully connected layers. This study uses a publicly available The Cancer Genome Atlas Hepatocellular Carcinoma (TCGA-LIHC)database (n=491) for model development and database of Kasturba Gandhi Medical College (KMC), India for validation. The pre-processing step involves patch extraction, colour normalization, and augmentation that results in 3920 patches for the TCGA dataset. The developed hybrid deep neural network consisting of a CNN-based pre-trained feature extractor and a customized artificial neural network-based classifier is trained using five-fold cross-validation. For this study, eight different state-of-the-art models are trained and tested as feature extractors for the proposed hybrid model. The proposed hybrid model with ResNet50-based feature extractor provided the sensitivity, specificity, F1-score, accuracy, and AUC of 100.00%, 100.00%, 100.00%, 100.00%, and 1.00, respectively on the TCGA database. On the KMC database, EfficientNetb3 resulted in the optimal choice of the feature extractor giving sensitivity, specificity, F1-score, accuracy, and AUC of 96.97, 98.85, 96.71, 96.71, and 0.99, respectively. The proposed hybrid models showed improvement in accuracy of 2% and 4% over the pre-trained models in TCGA-LIHC and KMC databases.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the early - diagnosis challenges of hepatocellular carcinoma (HCC), especially by improving the manual evaluation process of liver histopathology images based on HE staining. Specifically, the authors propose a hybrid deep - learning architecture to improve the accuracy and efficiency of hepatocellular carcinoma grading and classification. ### Problem Background Hepatocellular carcinoma (HCC) is one of the most common types of primary liver cancer in the world, causing more than 700,000 deaths every year. Early diagnosis is crucial for improving the survival rate of patients. However, the traditional manual evaluation of whole - slide images (WSI) of HE - stained samples is not only time - consuming but may also lead to differences and errors in decision - making. Therefore, it is of great significance to develop an automated and accurate method for hepatocellular carcinoma grading and classification. ### Solution To solve the above problems, the authors propose a strategy based on hybrid deep - learning, which combines transfer learning and fine - tuning techniques. The specific steps are as follows: 1. **Feature Extraction**: Use pre - trained convolutional neural network (CNN) models (such as ResNet50, EfficientNet, etc.) as feature extractors. 2. **Classifier Design**: Add a series of fully - connected layers after the feature extractor to form a customized classifier. 3. **Data Processing**: - **Dataset**: Use the publicly available TCGA - LIHC database (491 WSIs) for model training and the proprietary database of Kasturba Gandhi Medical College in India for validation. - **Pre - processing**: Include image patch extraction, color standardization, and data augmentation. 4. **Training and Validation**: Use five - fold cross - validation for model training and evaluate the model performance on the test set. ### Main Contributions - **Transfer Learning and Fine - Tuning**: Study the effect of transferring general - domain knowledge to the medical field and optimize the model performance by fine - tuning some of the top - level feature extraction layers. - **Classifier Design**: Design a fully - connected classifier that gradually reduces the feature dimension to ensure a good mapping of the output label space. - **Importance of Image Pre - processing**: Emphasize the crucial role of appropriate image pre - processing and training methods in improving the robustness and performance of the model. ### Experimental Results The experimental results show that the proposed hybrid model achieves 100% accuracy, sensitivity, specificity, F1 - score, and AUC value on the TCGA - LIHC database. On the KMC database, EfficientNetb3 as a feature extractor performs best, with an accuracy of 96.71%, which is approximately 4.65% higher than the base model. Through these improvements, this study significantly improves the accuracy of hepatocellular carcinoma grading and classification, providing a more reliable tool for clinical diagnosis.