Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation

Petra Bevandić,Marin Oršić,Josip Šarić,Ivan Grubišić,Siniša Šegvić
DOI: https://doi.org/10.1007/s11263-024-01986-z
IF: 13.369
2024-02-01
International Journal of Computer Vision
Abstract:Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on multiple datasets becomes a method of choice towards strong generalization in usual scenes and graceful performance degradation in edge cases. Unfortunately, popular datasets often have discrepant granularities. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings, manholes etc. Furthermore, many datasets have overlapping labels. For instance, pickups are labeled as trucks in VIPER, cars in Vistas, and vans in ADE20k. We address this challenge by considering labels as unions of universal visual concepts. This allows seamless and principled learning on multi-domain dataset collections without requiring any relabeling effort. Our method improves within-dataset and cross-dataset generalization, and provides opportunity to learn visual concepts which are not separately labeled in any of the training datasets. Experiments reveal competitive or state-of-the-art performance on two multi-domain dataset collections and on the WildDash 2 benchmark.
computer science, artificial intelligence
What problem does this paper attempt to address?