Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape

Chaoning Zhang,Yu Qiao,Shehbaz Tariq,Sheng Zheng,Chenshuang Zhang,Chenghao Li,Hyundong Shin,Choong Seon Hong
2023-06-03
Abstract:In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM), which has attracted significant attention. In this work, we understand SAM from the perspective of texture \textit{v.s.} shape. Different from label-oriented recognition tasks, the SAM is trained to predict a mask for covering the object shape based on a promt. With this said, it seems self-evident that the SAM is biased towards shape. In this work, however, we reveal an interesting finding: the SAM is strongly biased towards texture-like dense features rather than shape. This intriguing finding is supported by a novel setup where we disentangle texture and shape cues and design texture-shape cue conflict for mask prediction.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?