Abstract:In recent years, a body of works has emerged, studying shape and texture biases of off-the-shelf pre-trained deep neural networks (DNN) for image classification. These works study how much a trained DNN relies on image cues, predominantly shape and texture. In this work, we switch the perspective, posing the following questions: What can a DNN learn from each of the image cues, i.e., shape, texture and color, respectively? How much does each cue influence the learning success? And what are the synergy effects between different cues? Studying these questions sheds light upon cue influences on learning and thus the learning capabilities of DNNs. We study these questions on semantic segmentation which allows us to address our questions on pixel level. To conduct this study, we develop a generic procedure to decompose a given dataset into multiple ones, each of them only containing either a single cue or a chosen mixture. This framework is then applied to two real-world datasets, Cityscapes and PASCAL Context, and a synthetic data set based on the CARLA simulator. We learn the given semantic segmentation task from these cue datasets, creating cue experts. Early fusion of cues is performed by constructing appropriate datasets. This is complemented by a late fusion of experts which allows us to study cue influence location-dependent on pixel level. Our study on three datasets reveals that neither texture nor shape clearly dominate the learning success, however a combination of shape and color but without texture achieves surprisingly strong results. Our findings hold for convolutional and transformer backbones. In particular, qualitatively there is almost no difference in how both of the architecture types extract information from the different cues.

Trapped in texture bias? A large scale comparison of deep instance segmentation

Multi-Level Feature Descriptor for Robust Texture Classification via Locality-Constrained Collaborative Strategy

Troubleshooting image segmentation models with human-in-the-loop

Texture Learning Domain Randomization for Domain Generalized Segmentation

Global and Local Texture Randomization for Synthetic-to-Real Semantic Segmentation

Reducing Textural Bias Improves Robustness of Deep Segmentation Models

Texture Underfitting for Domain Adaptation

Shape-Texture Debiased Neural Network Training

The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

See more than once: Kernel-sharing atrous convolution for semantic segmentation

On the Texture Bias for Few-Shot CNN Segmentation

Adaptive Texture Filtering for Single-Domain Generalized Segmentation

Discriminative Features Reconstruction Network For Semantic Segmentation

Deep Structure-Revealed Network for Texture Recognition.

A Two-Pipeline Instance Segmentation Network via Boundary Enhancement for Scene Understanding

A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models

On the Influence of Shape, Texture and Color for Learning Semantic Segmentation

Adaptive Query Selection for Camouflaged Instance Segmentation

Deformable-Model based textured object segmentation

Where are the Masks: Instance Segmentation with Image-level Supervision