AttEntropy: On the Generalization Ability of Supervised Semantic Segmentation Transformers to New Objects in New Domains

Krzysztof Lis,Matthias Rottmann,Annika Mütze,Sina Honari,Pascal Fua,Mathieu Salzmann
2024-11-10
Abstract:In addition to impressive performance, vision transformers have demonstrated remarkable abilities to encode information they were not trained to extract. For example, this information can be used to perform segmentation or single-view depth estimation even though the networks were only trained for image recognition. We show that a similar phenomenon occurs when explicitly training transformers for semantic segmentation in a supervised manner for a set of categories: Once trained, they provide valuable information even about categories absent from the training set. This information can be used to segment objects from these never-seen-before classes in domains as varied as road obstacles, aircraft parked at a terminal, lunar rocks, and maritime hazards.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?