Structure v . s . Appearance and 3 D v . s . 2 D ? A Numeric Answer
Wenze Hu,Zhangzhang Si,Song-Chun Zhu
2013-01-01
Abstract:It has been widely acknowledged that while humans are quite good at extracting structures from images, i.e. the edges [4], textons [10] etc. which are concepts hidden in pixel intensities, the notion of structure does not lend itself to its precise detection by computer programs. As a result, there now exist appearance based image representations [14, 6] which directly express the image using statistics or histograms of image operator (filter) responses. Structure based and appearance based image representations are advocated by different researchers, whose reasons for endorsement range from the practical benefits in building simple vision applications to the faith that computer vision would ultimately stick to human vision. When different views of objects are taken into account, a similar dichotomy happens in describing the image structures. The intrinsic 3D shape of objects suggests that object-centered representation using volumetric primitives [2, 1, 15] should be simple yet capable of representing the observed image structure changes. But again the difficulty of extracting these 3D hidden concepts from images make the viewercentered representation [11, 12, 17, 21] a competing alternative, which uses a collections of 2D representations each covering a small portion of the modelled views. Over the two representations, researchers showed various cases where one representation prevailed [20, 3, 8], but there is no clear winner. In our view, these competing representations are points lying in different positions of representation spectrum, and they should be combined to better represent images. For example, consider the images of leaves at different scales shown in Fig.1, one can easily identify the structures inside the first image, but quickly give up and change to appearance based description for the last image. By gradually zooming the camera, images in between must combine some portion of both struc-