HER2 Molecular Marker Scoring Using Transfer Learning and Decision Level Fusion

Suman Tewary,Sudipta Mukhopadhyay
DOI: https://doi.org/10.1007/s10278-021-00442-5
Abstract:In prognostic evaluation of breast cancer, immunohistochemical (IHC) marker human epidermal growth factor receptor 2 (HER2) is used for prognostic evaluation. Accurate assessment of HER2-stained tissue sample is essential in therapeutic decision making for the patients. In regular clinical settings, expert pathologists assess the HER2-stained tissue slide under microscope for manual scoring based on prior experience. Manual scoring is time consuming, tedious, and often prone to inter-observer variation among group of pathologists. With the recent advancement in the area of computer vision and deep learning, medical image analysis has got significant attention. A number of deep learning architectures have been proposed for classification of different image groups. These networks are also used for transfer learning to classify other image classes. In the presented study, a number of transfer learning architectures are used for HER2 scoring. Five pre-trained architectures viz. VGG16, VGG19, ResNet50, MobileNetV2, and NASNetMobile with decimating the fully connected layers to get 3-class classification have been used for the comparative assessment of the networks as well as further scoring of stained tissue sample image based on statistical voting using mode operator. HER2 Challenge dataset from Warwick University is used in this study. A total of 2130 image patches were extracted to generate the training dataset from 300 training images corresponding to 30 training cases. The output model is then tested on 800 new test image patches from 100 test images acquired from 10 test cases (different from training cases) to report the outcome results. The transfer learning models have shown significant accuracy with VGG19 showing the best accuracy for the test images. The accuracy is found to be 93%, which increases to 98% on the image-based scoring using statistical voting mechanism. The output shows a capable quantification pipeline in automated HER2 score generation.
What problem does this paper attempt to address?