Contrastive pretraining for semantic segmentation is robust to noisy positive pairs

Sebastian Gerard,Josephine Sullivan
DOI: https://doi.org/10.48550/arXiv.2211.13756
2023-01-24
Abstract:Domain-specific variants of contrastive learning can construct positive pairs from two distinct in-domain images, while traditional methods just augment the same image twice. For example, we can form a positive pair from two satellite images showing the same location at different times. Ideally, this teaches the model to ignore changes caused by seasons, weather conditions or image acquisition artifacts. However, unlike in traditional contrastive methods, this can result in undesired positive pairs, since we form them without human supervision. For example, a positive pair might consist of one image before a disaster and one after. This could teach the model to ignore the differences between intact and damaged buildings, which might be what we want to detect in the downstream task. Similar to false negative pairs, this could impede model performance. Crucially, in this setting only parts of the images differ in relevant ways, while other parts remain similar. Surprisingly, we find that downstream semantic segmentation is either robust to such badly matched pairs or even benefits from them. The experiments are conducted on the remote sensing dataset xBD, and a synthetic segmentation dataset for which we have full control over the pairing conditions. As a result, practitioners can use these domain-specific contrastive methods without having to filter their positive pairs beforehand, or might even be encouraged to purposefully include such pairs in their pretraining dataset.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper explores the robustness of contrastive learning in semantic segmentation tasks against "noisy positive pairs". Specifically, the researchers focus on: 1. **The influence of noisy positive pairs**: - In traditional contrastive learning methods, positive pairs are usually obtained by applying different data augmentations to the same image. However, in domain - specific contrastive learning, positive pairs can be composed of two different but domain - related images (for example, satellite images of the same location at different times). This may lead to some unnecessary positive pairs because these pairings are generated in an unsupervised manner. - These unnecessary positive pairs may contain important changes that need to be distinguished in downstream tasks (such as the difference before and after building damage), which may thus affect the model performance. 2. **Robustness in semantic segmentation tasks**: - The researchers are particularly concerned about whether these noisy positive pairs will have a negative impact on the downstream semantic segmentation tasks. Unlike classification tasks, semantic segmentation tasks require the model to understand the local features in the image more finely. 3. **Experimental verification**: - The researchers verified the robustness of contrastive learning pre - training against noisy positive pairs by conducting experiments on the remote sensing dataset xBD and a synthetic dataset VTS. The results show that the semantic segmentation task has a certain degree of robustness to noisy positive pairs and can even benefit in some cases. ### Main findings of the paper - **Robustness of contrastive learning to noisy positive pairs**: The experimental results show that the semantic segmentation task has a certain degree of robustness to noisy positive pairs and can even benefit in some cases. In particular, on the VTS dataset, increasing the proportion of noisy positive pairs usually improves the model performance, unless almost all pixels in the image are mis - matched (i.e., completely different image pairings). - **The mechanism of action of noisy positive pairs**: The researchers further explored whether the performance improvement brought by noisy positive pairs is only due to the increased exposure of relevant features or whether other mechanisms are at work. The results show that in addition to feature exposure, noisy positive pairs may also play a role of regularization, helping the model generalize better. ### Summary This paper experimentally proves the robustness of contrastive learning in semantic segmentation tasks against noisy positive pairs and reveals that noisy positive pairs may improve model performance by increasing feature exposure and regularization effects. This finding provides a valuable reference for practitioners using domain - specific contrastive learning methods, especially when dealing with large - scale, unsupervised datasets.