Closing the gap in domain adaptation for semantic segmentation: a time-aware method

Joan Serrat,Jose Luis Gómez,Antonio M. López
DOI: https://doi.org/10.1007/s00138-024-01626-z
IF: 2.983
2024-11-29
Machine Vision and Applications
Abstract:Semantic segmentation models need a large number of images to be effectively trained but manual annotation of such images has a high cost. Active domain adaptation addresses this problem by pretraining the model with a synthetically generated dataset and then fine-tuning it with a few selected label annotations (the "budget") on real images to account for the domain shift. Previous works annotate a percentage of either individual pixels or whole target images. We argue that the first is infeasible in practice, and the second spends part of the budget on classes that the pretrained model may have already learned well. We propose a method based on the annotation of regions computed by Segment Anything, a recently introduced foundation model for class-agnostic image segmentation. The key idea is to assign a ground truth label to each of a tiny subset of regions, those for which the model is more uncertain. In order to increase the number of annotated regions we propagate the ground truth labels to most similar regions according to a hierarchical clustering algorithm that uses the features learned by the pretrained model. Our method outperforms the state-of-the-art on the GTA5 to Cityscapes benchmark by using fewer annotations, almost closing the gap between the synthetically pre-trained model and that obtained with full supervision of the real images. Furthermore, we present competitive results for budgets less than 1% of samples and also for a larger and more challenging target dataset, Mapillary Vistas.
computer science, cybernetics, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?