Impact of imperfect annotations on CNN training and performance for instance segmentation and classification in digital pathology

Laura Gálvez Jiménez,Christine Decaestecker
DOI: https://doi.org/10.1016/j.compbiomed.2024.108586
2024-10-18
Abstract:Segmentation and classification of large numbers of instances, such as cell nuclei, are crucial tasks in digital pathology for accurate diagnosis. However, the availability of high-quality datasets for deep learning methods is often limited due to the complexity of the annotation process. In this work, we investigate the impact of noisy annotations on the training and performance of a state-of-the-art CNN model for the combined task of detecting, segmenting and classifying nuclei in histopathology images. In this context, we investigate the conditions for determining an appropriate number of training epochs to prevent overfitting to annotation noise during training. Our results indicate that the utilisation of a small, correctly annotated validation set is instrumental in avoiding overfitting and maintaining model performance to a large extent. Additionally, our findings underscore the beneficial role of pre-training.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to explore the impact of imperfect annotations (i.e., noisy annotations) on the training and performance of convolutional neural networks (CNNs) in digital pathology. Specifically, the research focuses on how noisy annotations affect the training and performance of models when detecting, segmenting, and classifying cell nuclei in tissue samples. #### Research Background and Problem Description 1. **Lack of High - Quality Datasets** - In digital pathology, accurate diagnosis depends on segmentation and classification tasks of a large number of instances (such as cell nuclei). However, due to the complexity of the annotation process, high - quality datasets are often difficult to obtain. 2. **Impact of Annotation Noise** - Annotation noise can be manifested as unclear contours, missing objects, or incorrect class labels, etc. Especially in medical applications, because the annotation process is complex and time - consuming, annotators are not necessarily all experts, which further reduces the reliability of annotations. 3. **Overfitting Problem** - When the training data contains noisy annotations, the model may overfit these noises, thus affecting its generalization ability. Therefore, how to determine the appropriate number of training epochs to avoid overfitting is a key issue. #### Research Objectives - **Evaluate the Impact of Noisy Annotations**: By introducing different types of annotation noises, study their specific impacts on the performance of CNN models. - **Prevent Overfitting**: Explore how to use a small and clean validation set to avoid model overfitting to noisy annotations and maintain good performance. - **The Role of Pretraining**: Verify whether pretraining helps to improve the robustness of the model in a noisy environment. #### Main Contributions - **Selection of Training Epochs**: Analyze how to determine the appropriate number of training epochs to avoid overfitting in the presence of noisy annotations. - **Performance Evaluation**: Quantify the negative impact of noisy annotations on model performance through specific evaluation metrics (such as F1 - score for detection, segmentation, and classification tasks, etc.). - **Experimental Verification**: Experiments prove that the training - stopping strategy based on a small and clean validation set can significantly restore model performance and emphasize the importance of pretraining. ### Summary This paper studies the impacts of different types of annotation noises on CNN models in the tasks of detecting, segmenting, and classifying cell nuclei by systematically introducing them. The research results show that using a small and clean validation set for training - stopping is an effective method to avoid overfitting, and pretraining also plays a positive role.