Virtual Category Learning: A Semi-Supervised Learning Method for Dense Prediction with Extremely Limited Labels

Changrui Chen,Jungong Han,Kurt Debattista
2024-02-12
Abstract:Due to the costliness of labelled data in real-world applications, semi-supervised learning, underpinned by pseudo labelling, is an appealing solution. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the issue of confirmation bias caused by the resulting inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. Specifically, a Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation even without a concrete label. This provides an upper bound for inter-class information sharing capacity, which eventually leads to a better embedding space. Extensive experiments on two mainstream dense prediction tasks -- semantic segmentation and object detection, demonstrate that the proposed VC learning significantly surpasses the state-of-the-art, especially when only very few labels are available. Our intriguing findings highlight the usage of VC learning in dense vision tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of semi-supervised learning in practical applications where the cost of labeled data is high. Specifically, it focuses on how to effectively utilize limited labeled data for dense prediction tasks such as semantic segmentation and object detection. ### Main Issues 1. **High Cost of Labeled Data**: In real-world applications, especially for dense prediction tasks like object detection and semantic segmentation, the cost of labeled data is very high. This makes fully supervised learning methods difficult to apply on a large scale. 2. **Challenges in Handling Confusing Samples**: Existing pseudo-label methods face difficulties in handling confusing samples. Discarding these valuable confusing samples can harm the model's generalization ability, while using them for training may exacerbate the confirmation bias problem, as the pseudo-labels for these samples are often incorrect. ### Solution To address the above issues, the paper proposes a new semi-supervised learning method called Virtual Category (VC) learning. Specifically: 1. **Virtual Category Assignment**: Each confusing sample is assigned a virtual category (VC), so that even without specific label information, these samples can safely participate in model optimization. 2. **Optimization Direction**: By constructing a Potential Category Set (PC), VC learning provides a reasonable upper bound for inter-class information sharing capability, thereby avoiding incorrect optimization directions. 3. **Experimental Validation**: Extensive experiments were conducted on two mainstream dense prediction tasks (semantic segmentation and object detection), showing that VC learning significantly outperforms existing methods, especially when labeled data is very limited. ### Key Contributions 1. **Utilization of Confusing Samples**: Through VC learning, confusing samples are effectively utilized, alleviating the confirmation bias problem, and performing exceptionally well when labeled data is very limited. 2. **Theoretical Feasibility**: The paper theoretically demonstrates the feasibility of VC learning in semi-supervised learning, emphasizing the need to rethink the use of confusing samples in semi-supervised tasks. 3. **Extended Applications**: VC learning is extended to semi-supervised semantic segmentation tasks, introducing various methods for constructing potential category sets and additional forms of loss functions, further validating its generality and effectiveness across multiple tasks. ### Experimental Results - **Object Detection**: On the MS COCO dataset, using only 586 labeled images, VC learning achieved 19.46 mAP, surpassing some recently published semi-supervised detectors that use over 1000 labeled images. - **Semantic Segmentation**: On the Pascal VOC dataset, using only 82 labeled images, VC learning achieved 55.37 mIoU, significantly outperforming existing methods. In summary, this paper proposes the Virtual Category learning method, effectively addressing the challenges of performing dense prediction tasks with very limited labeled data, and has significant theoretical and practical implications.