How well does CLIP understand texture?

Chenyun Wu,Subhransu Maji
DOI: https://doi.org/10.48550/arXiv.2203.11449
2022-11-05
Abstract:We investigate how well CLIP understands texture in natural images described by natural language. To this end, we analyze CLIP's ability to: (1) perform zero-shot learning on various texture and material classification datasets; (2) represent compositional properties of texture such as red dots or yellow stripes on the Describable Texture in Detail(DTDD) dataset; and (3) aid fine-grained categorization of birds in photographs described by color and texture of their body parts.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?